A v
THEORETICAL
AND COMPUTATIONAL
CHEMISTRY
Theoretical Biochemistry Processes and Properties of Biological Systems
T H E O R E T I C A L AND C O M P U T A T I O N A L CHEMISTRY
SERIES EDITORS
Professor P. Politzer Department of Chemistry University of New Orleans New Orleans, LA 70148, U.S.A.
Professor Z.B. Maksi~ Rudjer Bo'~kovid~Institute P.O. Box 1016, 10001 Zagreb, Croatia VOLUME 1
Quantitative Treatments of Solute/Solvent Interactions
P. Politzer and J.S. Murray (Editors) VOLUME Z Modern Density Functional Theory: A Tool for Chemistry J.M. Seminario and P. Politzer (Editors) VOLUME 3 Molecular Electrostatic Potentials: Concepts and Applications J.S. Murray and K. Sen (Editors) VOLUME 4 Recent Developments and Applications of Modern Density Functional Theory J.M. Seminario (Editor) VOLUME S Theoretical Organic Chemistry
C. Pdrkdnyi (Editor) VOLUME 6 Pauling's Legacy: Modern Modelling of the Chemical Bond Z.B. Maksi~ and W.J. Orville-Thomas (Editors) VOLUME 7 Molecular Dynamics: From Classical to Quantum Methods
P.B. Balbuena and J.M. Seminari0 (Editors) VOLUME 8 Computational Molecular Biology
J. Leszczynski (Editor) VOLUME 9 Theoretical Biochemistry: Processes and Properties of Biological Systems
L.A. Eriksson (Editor)
THEORETICAL
AND
O
COMPUTATIONAL
CHEMISTRY
Theoretical Biochemistry Processes and Properties of Biological Systems
Edited by Leif
A. Eriksson
Department of Q u a n t u m Chemistry Uppsala University 751 - 2 0 Uppsala, Sweden
ELSEVIER 2001 Amsterdam
- L o n d o n - N e w Y o r k - O x f o r d - Paris - S h a n n o n - T o k y o
ELSEVIER SCIENCE B.V. Sara Burgerhartstraat 25 P.O. Box 211, 1000 AE Amsterdam, The Netherlands 9 2001 Elsevier Science B.V. All rights reserved. This work is protected under copyright by Elsevier Science, and the following terms and conditions apply to its use: Photocopying Single photocopies of single chapters may be made for personal use as allowed by national copyright laws. Permission of the Publisher and payment of a fee is required for all other photocopying, including multiple or systematic copying, copying for advertising or promotional purposes, resale, and all forms of document delivery. Special rates are available for educational institutions that wish to make photocopies for non-profit educational classroom use. Permissions may be sought directly from Elsevier Science Global Rights Department, PO Box 800, Oxford OX5 1DX, UK; phone: (+44) 1865 843830, fax: (+44) 1865 853333, e-mail:
[email protected]. You may also contact Global Rights directly through Elsevier's home page (http://www.elsevier.nl), by selecting 'Obtaining Permissions'. In the USA, users may clear permissions and make payments through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA; phone: (+1) (978) 7508400, fax: (+1) (978) 7504744, and in the UK through the Copyright Licensing Agency Rapid Clearance Service (CLARCS), 90 Tottenham Court Road, London W1P 0LP, UK; phone: (+44)207 631 5555; fax: (+44)207 631 5500. Other countries may have a local reprographic rights agency for payments. Derivative Works Tables of contents may be reproduced for internal circulation, but permission of Elsevier Science is required for external resale or distribution of such material. Permission of the Publisher is required for all other derivative works, including compilations and translations. Electronic Storage or Usage Permission of the Publisher is required to store or use electronically any material contained in this work, including any chapter or part of a chapter. Except as outlined above, no part of this work may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without prior written permission of the Publisher. Address permissions requests to: Elsevier Science Global Rights Department, at the mail, fax and e-mail addresses noted above. Notice No responsibility is assumed by the Publisher for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions or ideas contained in the material herein. Because of rapid advances in the medical sciences, in particular, independent verification of diagnoses and drug dosages should be made. First edition 2001 Library of Congress Cataloging in Publication Data A catalog record from the Library of Congress has been applied for.
ISBN: ISSN:
0-444-50292-0 1380-7323 (Series)
~) The paper used in this publication meets the requirements of ANSI/NISO Z39.48-1992 (Permanence of Paper). Printed in The Netherlands.
FOREWORD Theoretical chemistry has been an area of tremendous expansion and development over the past decade; from an approach where we were able to treat only a few atoms quantum mechanically or make fairly crude molecular dynamics simulations, into a discipline with an accuracy and predictive power that has rendered it an essential complementary tool to experiment in basically all areas of science. One of the areas where the success of computational chemistry perhaps has been most profound is that of biochemistry/biophysics. With the development of faster and cheaper computers, algorithms that in increasing number scale linearly (in particular the bigger the system under study is), and completely new methods and hybrids between methods, we are now able to investigate systems of between 50-100 atoms with quantum chemical methods, even larger aggregates by combining QM and MM methods or performing quantum-MD simulations, or systems with, say, 50000 atoms in large scale classical MD simulations. As the systems become increasingly realistic, direct comparisons with experimental data hence becomes possible. The intention of this volume is to give a flavour of the types of problems in biochemistry that theoretical calculations can solve at present, and to illustrate the tremendous predictive power these approaches possess. With these aspects in mind, I have tried to gather some of the leading scientists in the field of theoretical/computational biochemistry and let them present their work. You will hence find a wide range of computational approaches, from classical MD and Monte Carlo methods, via semi-empirical and DFT approaches on isolated model systems, to Car-ParrineUo QM-MD and novel hybrid QM/MM studies. The systems investigated also cover a broad range; from membranebound proteins to various types of enzymatic reactions as well as inhibitor studies, cofactor properties, solvent effects, transcription and radiation damage to DNA. It is my hope that the work presented herein will provide as much pleasure in reading, as I have had in editing the volume, and that it will help to stimulate discussions and further development of a truly fascinating field of science. Leif A. Eriksson June, 2000.
This Page Intentionally Left Blank
vii
TABLE OF C O N T E N T S Chapter 1. The Structure and Function of Blue Copper Proteins, U. Ryde, M.H.M. Olsson and K. Pierloot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 The optimal geometry of the blue-copper active site . . . . . . . . . . . . 3.2 Trigonal and tetragonal Cu(II) structures . . . . . . . . . . . . . . . . . . . . . . . . . . 93.3 The sensitivity of the geometries to the theoretical method ..... 3.4 Geometry optimisations in the protein . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. Electronic spectra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 The electronic spectrum of plastocyanin . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Correlation between structure and spectroscopy of copper proteins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 The sensitivity of the calculated spectra on the theoretical method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5. Reorganisation energies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6. Reduction potentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7. Related proteins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 The binuclear CUA site . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Cytochromes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Iron-sulphur clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8. Protein strain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9. Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1 1 2 6 6 8 12
14 17 18 20 24 26 28 32 32 37 40 42 46
Chapter 2. Myoglobin, D. Karancsi-Menyhfird, G. Keserti and G. Nfiray1. 2. 3. 4. 5. 6.
Szab6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conformation and structural dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Complexes with various ligands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Photodissociation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Recombination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ligand migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
57 57 58 66 73 79 86
Chapter 3. Mechanisms for Enzymatic Reactions Involving Formation or Cleavage of O-O Bonds, P.E.M. Siegbahn and M.R.A. Blomberg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
95 95
viii 2. Methods and models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. Formation of O2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. O-O bond cleavage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 O-O bond activation in cytchrome oxidase . . . . . . . . . . . . . . . . . . . . . . . 4.2 Heine peroxidases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 O-O bond activation in methane m o n o o x y g e n a s e . . . . . . . . . . . . . . . 4.4 O-O bond activation in manganese catalase . . . . . . . . . . . . . . . . . . . . . . 4.5 O-O bond activation in isopenicillin N synthase . . . . . . . . . . . . . . . . . 5. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
97 99 107 107 118 121 128 133 137
Chapter 4. 1. 2. 3. 4. 5. 6.
Catalytic Reactions of Radical Enzymes, F. H i m o and L.A. Eriksson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Methodology ................................................................. Galactose oxidase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pyruvate formate-lyase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ribonucleotide reductase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
C h a p t e r 5. Theoretical Studies o f Coenzyme B12-Dependent CarbonSkeleton Rearrangemems, D.M. Smith, S.D. Wetmore and L. Radom ....................................................................... 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. B a c k g r o u n d . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Vitamin B 12: What is it? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 C o e n z y m e B12: What does it do? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 The bound flee-radical hypothesis" H o w does coenzyme B12 work? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 The radical rearrangement mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. Evaluation o f theoretical techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. 2-Methyleneglutarate mutase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Fragmentation-recombination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Addition-elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Facilitation by protonation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5. M e t h y l m a l o n y l - C o A mutase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Fragmentation-recombination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Addition-elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Facilitation by protonation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6. Glutamate mutase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Fragmentation-recombination pathway for the rearrangement o f the aminopropyl radical . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Rearrangement o f the iminopropyl radical . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Hydride ion removal from the aminopropyl radical . . . . . . . . . . . . . .
145 145 147 149 158 169 177
183 183 184 184 185 186 188 190 193 195 196 196 197 199 199 199 200 201 202 204
ix 7. Comparison of the models for B12-dependent carbon-skeleton mutases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8. The partial-proton-transfer concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
205 206 209
C h a p t e r 6. Simulations of Enzymatic Systems" Perspectives from CarParrinello Molecular Dynamics Simulations, P. Carloni and U. Rothlisberger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Principles of the Car-Parrinello method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. Car-Parrinello modeling of biological systems . . . . . . . . . . . . . . . . . . . . . . . . . . 4. Applications to non-enzymatic systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Nucleic acids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Heme-based proteins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Cyclic peptides and ion channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Photosensitive proteins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5. Applications to enzymes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Test cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Human carbonic anhydrase II (HCAII) . . . . . . . . . . . . . . . . . . . 5.2.2 Serine proteases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Enzymes as targets for pharmaceutical intervention . . . . . . . . . . . . . . 5.3.1 HIV- 1 protease (HIV- 1 PR) .......................................... 5.3.2 HIV-1 reverse transcriptase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.3 Herpes simplex virus type 1 thymidine kinase: a target for gene-therapy based anticancer drugs . . . . . . . . . . . . . . . . . . 5.3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Rational design of biomimetic catalysts by hybrid Q M / M M Car-Parrinello simulations of galactose oxidase . . . . . . . . . . . . . . . . 6. Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
236 243
C h a p t e r 7. Computational Enzymology: Protein Tyrosine Phosphatase Reactions, K. Kolmodin, V. Luzhkov and J. Aqvist . . . . . . . . . . . . . . . . . . 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Protein tyrosine phosphatase reactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Protein tyrosine phosphatases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 The PTPase reaction mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. The empirical valence bond method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 EVB and PTPase reaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Calibration of the EVB potential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Simulation details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. Reaction free energy profile of the LMPTP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Step 1: Substrate dephosphorylation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
253 253 254 254 255 256 257 258 262 263 263
215 215 216 218 219 219 219 219 220 220 220 221 221 225 228 229 233 234 235
4.2 Binding free energy calculations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Step 2: Phosphoenzyme hydrolysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Reaction mechanism for mutants lacking the general acid residue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 The pKa of the catalytic cysteine is different in LMPTP and PTP1B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5. Substrate trapping in cysteine to serine mutated PTPases .............. 6. Prediction of a ligand induced conformational change in the active site of CDC25A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7. Kinetic isotope effects in phosphoryl transfer reactions ................. 7.1 Calculations of heavy atom kinetic isotope effect in phosphate monoester hydrolysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C h a p t e r 8. Mome Carlo Simulations ofHIV-1 Protease Binding Dynamics and Thermodynamics with Ensembles of Protein Conformations: Incorporating Protein Flexibility in Deciphering Mechanisms of Molecular Recognition, G.M. Verkhivker, D. Bouzida, D.K. Gehlhaar, P.A: Rejto, L. Schaffer, S. Arthurs, A.B. Colson, S.T. Freer, V. Larson, B.A: Luty, T. Marrone and P.W. Rose ............. 1. Structural models for molecular recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Structure-based analysis of H IV-1 protease-inhibitor binding ......... 2.1 Structure-based analysis ofHIV- 1 protease-SB203386 inhibitor binding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. Structure-based computational models of ligand-protein binding dynamics and molecular docking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. Computer simulations of ligand-protein binding . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Computer simulations of ligand-protein docking . . . . . . . . . . . . . . . . . 4.2 Monte Carlo equilibrium simulations of ligand-protein thermodynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Monte Carlo data analysis with the weighted histogram method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5. Computer simulations ofHIV-1 protease-inhibitor binding dynamics and thermodynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C h a p t e r 9. Modelling G-Protein Coupled Receptors, C. Higgs and C.A. Reynolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Receptor structure and modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. Ligand binding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. Structural changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5. Receptor-G-protein interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
266 268 271 271 274 274 276 279 279
289 289 293 296 298 302 304 306 309 312 327
341 341 342 351 356 359
xi 6. G P C R dimerisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C h a p t e r 10. Protein-DNA Interactions in the Initiation of Transcription: The Role of Flexibility and Dynamics of the TATA Recognition Sequence and the TATA Box Binding Protein, N. Pastor and H. Weinstein . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. TBP and transcription . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Structural biology of TBP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Kinetics and thermodynamics of TATA box recognition and binding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. TATA box sequence specific recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 The role of direct readout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 The energy cost of DNA bending: an alternative sequencedependence mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Stable bends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Flexibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.3 Free energy calculations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 The dehydration of the interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Integration of the various contributions into mechanistic criteria for the formation of TBP-DNA complexes . . . . . . . . . . . . . . . 3. Dynamic effects in complex stabilization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. Towards the preinitiation complex assembly . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5. Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
363 366
377 377 378 380 382 382 387 387 390 395 396 397 398 400 401
Chapter 11.
A Multi-Component Model for Radiation Damage to D N A from its Constituents, S.D. Wetmore, L.A. Eriksson and R.J. Boyd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Characterization of DNA radiation products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Pyrimidine and purine radiation products" close agreement between experiment and theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Pyrimidine and purine radiation products" problematic cases... 2.3 New mechanism for radiation damage in cytosine monohydrate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Sugar radicals in irradiated DNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. Full D N A studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 The primary radicals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 The secondary radicals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 The effects of water on radical formation in D N A . . . . . . . . . . . . . . . 3.4 Major radical products formed in irradiated D N A . . . . . . . . . . . . . . . . 3.5 DNA cations and secondary radicals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 D N A anions and secondary radicals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
409 409 411 413 417 424 429 437 438 441 445 447 449 453
xii 4. A Multi-component model for D N A radiation damage . . . . . . . . . . . . . . . . . 5. Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
456 458
Chapter 12. N e w Computational Strategies for the Quantum Mechanical
1. 2.
3. 4.
5.
6.
Study of Biological Systems in Condensed Phases, C. Adamo, M. Cossi, N. Rega and V. Barone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The density functional model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Functionals of the electronic density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 The PBE functional . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Beyond the PBE functional . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 A further improvement: the hybrid HF/DF methods . . . . . . . . . . . . . 2.5 Beyond the GGA functionals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Some tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.1 EPR hyperfine coupling constants . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.2 N M R absolute shieldings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.3 General comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vibrational averaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Solvent effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Outline of the P C M . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Extension to large solutes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Some examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Conformational analysis including solvent effects . . . . . . . . . . . . . . . 5.2 Characterization of organic free radicals. Structure and magnetic properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Glycine radical . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.2 5,6-dihydro-6-thymyl and 5,6-dihydro-5-thymyl radicals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.3 Pyrrolidine-l-oxyl and imidazoline-l-oxyl radicals ..... Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
467 467 469 470 472 474 476 479 481 482 484 487 488 496 498 502 505 507 509 513 514 524 529 532
Chapter 13. Modelling Enzyme-Ligand Interactions, M.J. Ramos, A. Melo and E.S. Henriques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Strategies in enzyme-ligand design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Receptor homology-built models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Mapping the binding region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Assembling the ligand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Docking the ligand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Scoring the ligand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Refining the enzyme-ligand structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. The enzyme-ligand complex in motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
539 539 540 541 543 545 548 550 553 555
xiii 3.1 Monte Carlo and molecular dynamics simulations . . . . . . . . . . . . . . . 3.2 Continuum electrostatic methods and Brownian dynamics ...... 3.3 Rigorous free energy simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Approximate free energy simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. A quantum insight into the study of enzyme-ligand interactions ...... 4.1 Quantum mechanical methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Hybrid Q M / C M methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C h a p t e r 14. The Q M / M M Approach to Enzymatic Reactions, A.J. Mulholland . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Simulation approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Basic theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Q M / M M partitioning schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. Q M / M M methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Method development and testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. Techniques for reaction modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Optimization of transition structures and reaction pathways .... 4.2 Activation free energies, conformational behaviour and dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5. Practical aspects of modelling enzyme reactions . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Choice and preparation of the starting structure . . . . . . . . . . . . . . . . . . 5.2 Choice of theoretical model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Performance of semiempirical Q M methods . . . . . . . . . . . . . 5.3 Definition of the Q M region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Mechanistic questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6. Some recent applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Para-hydroxybenzoatehydroxylase (PHBH) . . . . . . . . . . . . . . . . . . . . . 6.2 Citrate synthase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 H u m a n immunodeficiency virus protease . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Enolase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5 Malate dehydrogenase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.6 Lactate dehydrogenase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.7 Papain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.8 Influenza neuraminidase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.9 cAMP-dependent protein kinase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.10 Chorismate mutase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C h a p t e r 15. Quinones and Quinoidal Radicals in Photosynthesis, R.A.
555 559 561 565 568 569 572 578
597 597 599 603 603 607 610 614 615 618 618 621 625 625 627 628 629 630 631 631 635 639 640 641 641 643 644 644 645 646
xiv Wheeler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Plant photosystem II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Bacterial photosynthetic reaction centers . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Tests of computational methods for calculating properties of quinoidal radicals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Tyrosyl radical and its phenoxyl and p-cresyl radical models... 2.2 Para-benzoquinoneand its semiquinone radical anion .......... 3. Calculated properties of quinoidal radicals important in photosynthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Plastoquinones and their radicals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Menaquinones and their semiquinone radical anions . . . . . . . . . . . . 3.3 Ubiquinones and their semiquinone radical anions . . . . . . . . . . . . . . . . 4. Semiquinone radical anions in plant photosystem II . . . . . . . . . . . . . . . . . . . . 5. Conclusions and future directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Retrospective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Future promise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
655 655 657 658 659 660 665 670 670 674 677 683 684 685 685
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
691
Subject Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
695
L.A. Eriksson (Editor) Theoretical Biochemistry - Processes and Properties of Biological Systems Theoretical and Computational Chemistry, Vol. 9 9 2001 Elsevier Science B.V. All rights reserved
Chapter 1
The structure and function of blue copper proteins Ulf Ryde a, Mats H. M. Olsson a, and Kristine Pierloot b*
aDepartment of Theoretical Chemistry, Lund University, Chemical Centre, P. O. Box 124, S-221 00 Lund, Sweden, e-mail:
[email protected] bDepartment of Chemistry, University of Leuven, Celestijnenlaan 200F, B-3001 Heverlee-Leuven, Belgium
Theoretical investigations of the structure and function of the blue copper proteins and the dimeric CUA site are described. We have studied the optimum vacuum geometry of oxidised and reduced copper sites, the relative stability of trigonal and tetragonal Cu(II) structures, the relation between the structure and electronic spectra, the reorganisation energy, and reduction potentials. We also compare their electron-transfer properties with those of cytochromes and ironsulphur clusters. Our calculations give no support to the suggestion that strain plays a significant role in the function of these proteins; on the contrary, our results show that the structures encountered in the proteins are close to their optimal vacuum geometries (within 7 kJ/mole) and that the favourable properties are achieved by an appropriate choice of ligands. We use the density functional B3LYP method for the geometries, multiconfigurational second-order perturbation theory (CASPT2) for calculations of accurate energies and spectra, pointcharge models, continuum approaches, and combined classical and quantum chemical methods for the environment, and classical force-field calculations for estimation of dynamic effects and free energies.
I. INTRODUCTION The blue copper proteins or cupredoxins are a group of proteins that exhibit a number of unusual properties, viz. a bright blue colour, a narrow hyperfine sprit-
*This investigation has been supported by grants from the Swedish Natural Science Research Council, the Flemish Science Foundation, the Concerted Action of the Flemish Government, and by the European Commission through the TMR program (grant ERBFMRXCT960079). It has also been supported by computer resources of the Swedish Council for Planning and Coordination of Research, Parallelldatorcentrum at the RoYal Institute of Technology, Stockholm, the National Supercomputer Centre at the University of Linkrping, the High Performance Computing Center North at the University of UmeL and Lunarc at Lund University.
ting in the electron spin resonance (ESR) spectra, and high reduction potentials [1-3]. Moreover, crystal structures of these proteins show an extraordinary cupric geometry" The copper ion is bound to the protein in an approximate trigonal plane formed by a cysteine (Cys) thiolate group and two histidine (His) nitrogen atoms. The coordination sphere in most blue-copper sites is completed by one or two axial ligands, typically a methionine (Met) thioether group, but sometimes also a backbone carbonyl oxygen atom (in the azurins) or instead an amide oxygen atom from the side chain of glutamine (in the stellacyanins) [1-4]. The blue copper proteins serve as electron-transfer agents. Their distorted trigonal geometry is intermediate between the tetrahedral coordination preferred by Cu(I) and the tetragonal geometry of most Cu(II) complexes. As a result, the change in geometry when Cu(II) is reduced to Cu(I) is small [2,3,5], which gives a small reorganisation energy and allows for a high rate of electron transfer [6]. These unusual properties, unprecedented in the chemistry of small inorganic complexes, already in the 1960s led to the proposal that the protein forms a rigid structure, which forces the Cu(II) ion into a coordination geometry more similar to that preferred by Cu(I) [7,8]. These hypotheses were later extended into general theories for metalloproteins, suggesting that the protein forces the metal centre into a catalytically poised state, the entatic state theory [9,10] and the induced rack theory [ 11,12]. However, this suggestion has recently been challenged [13,14]. In particular, we have shown by quantum chemical calculations that the cupric geometry in the blue copper proteins is very close to the optimal vacuum structure of a Cu(II) ion with the same ligands [14]. Why then are the properties of the blue copper proteins so unusual, if not by protein strain? During the last five years, we have tried to answer this question using theoretical methods. In this review, we describe our results and discuss them in relation to the strain hypotheses. We also compare the blue copper proteins with related metal sites, such as the CUA dimer, cytochromes, and iron-sulphur clusters. Altogether, this gives an illustration of how theoretical methods, ranging from high-level quantum chemical calculations to pure classical simulations, can be used to study and solve biochemical problems.
2. METHODS
It is not yet possible to perform accurate quantum chemical calculations on a whole protein. Therefore models have to be constructed that are as realistic as possible but at the same time computationally tractable. We have used a number of techniques, ranging from high-level quantum chemical calculations on small models of the active site to classical simulations of the full protein. At the intermediate level, we have described the protein by quantum chemical methods and incorporated the effects of the surrounding protein and solvent by a variety of
methods, e.g. a point-charge model, a dielectric continuum, or by a classical force field. Each method has its strengths and disadvantages, and the choice of method is largely determined by the questions of interest and the available computer resources. For quantum chemical geometry optimisations we have used the density functional method B3LYP, as implemented in the Mulliken, Turbomole, or Gaussian softwares [15-18]. Hybrid density functional methods have been shown to give as good or better geometries as correlated ab initio methods for first-row transition metal complexes [ 19-21 ], and the B3LYP method in particular seems to give the most reliable results among the widely available density functional methods [22]. In most calculations, we have used a basis set of double-~ quality enhanced with p, d, and f functions for copper and iron [14,23,24] and the 6-31G* basis sets for the other atoms [25]. This basis set is denoted DZpdf/6-31G*. For calculation of accurate energies, geometries, and electronic spectra, the CASSCF-CASPT2 approach was used (second order perturbation theory with a multiconfigurational reference state) [27]. This method has been shown to give reliable results for organic molecules as well as transition-metal complexes, with an error consistently less than 2500 crn-~ [28]. Generally contracted atomic natural orbitals (ANO) type basis sets were used in these calculations [29]. They have the virtue of being compact but at the same time optimised to include as much correlation as possible at a given size. Due to the size of the systems studied, basis sets of moderate size have been used, including up to f-type functions on Cu and a d-type function on S, but often no polarisation functions on C, N, and H. The choice of the active orbital space for the CASSCF calculations is a crucial step, and has turned out to be especially difficult in these proteins and other systems containing a Cu-thiolate bond. From earlier studies it was known that in complexes with first-row transition metal ions with many 3d electrons, the active space should include one correlating orbital for each of the doubly occupied 3d orbitals [28]. Therefore the starting active space contains 10 orbitals (3d and 3d). In addition, it is necessary to add the 3p orbitals on Scys to describe correctly the covalent character of the Cu-Scys bond and also 2p and 3p orbitals on nitrogen and sulphur to describe charge-transfer states. The final active space therefore contain 11 or 12 active orbitals (12 active orbitals are at present the upper limit for the CASPT2 method). The CASSCF wavefunction is used as reference function in a second-order estimate of the remaining dynamical correlation effects. All valence electrons were correlated in this step and also the 3s and 3p shells on copper. Relativistic corrections (the Darwin and mass-velocity terms) were added to all CASPT2 energies. They were obtained at the CASSCF level using first-order perturbation theory. A level-shift (typically 0.3 Hartree) was added to the zeroth order Hamiltonian in order to remove intruder states [30]. Transition moments were computed with the CAS state-interaction method [31] at the CASSCF level. They were
combined with CASPT2 excitation energies to obtain oscillator strengths. The CASSCF--CASPT2 calculations were performed with the MOLCAS quantum chemistry software [32]. For further details, we refer to the original articles [14, 33-39]. In the quantum chemical calculations, only the copper ion and its ligands were included. Several models were tested for the ligands: histidine was modelled by either ammonia, imidazole (Im), or ImCH3, cysteine by SH-, SCH3-, or SC2H5-, methionine by SH2, S(CH3) 2, or S(CH3)(CH2CH3),and amide ligands by formaldehyde, formamide, or acetamide. In the calculations on azurin, the main-chain linkage between the histidine ligand and the backbone amide group was also included in the calculations (ImCH2CH2NHCOCH3). We have shown that a converged geometry and spectroscopy is obtained with imidazole as a model for histidine and with methyl groups on the sulphurs modelling cysteine and methionine [14,26,36]. Smaller models contain polar hydrogen atoms, which form artificial hydrogen bonds that may strongly distort the structure and change the energies. Fortunately, the explosive increase in computer power has the last two years made it unnecessary to compromise with the ligand models. However, before that, we often had to use smaller models and enforce symmetry to make the calculations feasible. Naturally, calculations in vacuum cannot reproduce all the properties of a metal site in a protein. The simplest way to include the surroundings in quantum chemical calculations is to assign a charge to each atom in the protein (and possibly also an equilibrated ensemble of solvent molecules) and include the field of these charges in the calculations. This method was used in most of the calculations of electronic spectra [33,34,36,39]. The best available crystal structures were used as starting coordinates. Hydrogen atoms and solvent molecules (in the form of a spherical cap) were added and their positions were equilibrated by the Amber suite of programs [40]. The final coordinates were used in the spectra calculations together with charges from the Amber libraries [40]. Another method to include solvent effects in quantum chemical calculations is the continuum approach. In the polarised continuum method (PCM), a molecule is placed in a cavity formed by overlapping atom-centred spheres surrounded by a dielectric medium [41]. The induced polarisation of the surroundings is represented by point charges distributed on the surface of the cavity, and the field of these charges affects the wavefunction. Thus, solvation effects are included in the wavefunction in a self-consistent manner. In addition to this electrostatic term, the PCM method includes additional terms that affect only the solute energy, viz. cavitation, dispersion and exchange energy [42]. We have used the conductor or serf-consistent isodensity PCM methods [43,44] as implemented in the Gaussian 98 software [17]. Further details are given in the original references [35,45]. Continuum methods can also be used in classical calculations including the full protein. In such calculations, each atom in the protein is assigned a point charge.
For the active site, the charge may be taken from a quantum chemical calculation [46]. Moreover, the protein is assigned a low dielectric constant (typically 4), whereas the surrounding solvent is assigned the value of water (-80). Then, the Poisson-Boltzmann equation is solved numerically on a grid coveting the protein and parts of the solvent [47]. Naturally, this method is less accurate than the PCM, but it can be used for much larger systems. We have used this method as implemented in the MEAD software [48] for the calculation of reduction potentials [45,49]. A related method is the protein-dipole Langevin-dipole method, in which water molecules are represented by Langevin dipoles on a grid and polarisation effects are included [50]. It has successfully been used in several investigations of the reorganisation energy and reduction potentials of proteins [50-53], but we have only used it in some explorative calculations. We have run several types of classical simulations on blue copper proteins, e.g. energy minimisations, molecular dynamics simulations, free energy calculation, and potential of mean force computations [33,34,36,38,39,54], all with the Amber software [40]. In such calculations, the copper ion and its ligands pose a special problem, since they are not included in the force fields. For crude calculations (especially when the metal site is kept fixed), it may be sufficient to determine an appropriate set of charges for the copper ion and its ligands from a fit to the electrostatic potential calculated by quantum chemical methods [46] (there is a significant transfer of charge from the ligands to the copper ion). For more accurate calculations, a full force-field parameterisation of the copper ion and its ligands has to be performed, involving charges, force constants, and equilibrium parameters. We have performed such a parameterisation for the copper sites in oxidised and reduced plastocyanin and oxidised nitrite reductase (both the type 1 and type 2 copper sites) [54]. The most satisfactory way to include the effect of the surroundings in quantum chemical calculations is to combine a quantum chemical and a classical program, the QC/MM approach. In this method, the interesting part of the system is treated by quantum chemical methods, whereas the rest is treated by classical methods. Classical forces on the quantum atoms are added to the quantum chemical forces before the atoms are moved (either in a geometry optimisation or in molecular dynamics simulations). If there are bonds between the quantum chemical and classical systems, special action is taken. This approach is very popular at present, and many different variants have been suggested [55-60]. We have recently updated our QC/MM program COMQUM [57] to incorporate the density functional methods of Turbomole [16] and the accurate force field methods in Amber [40]. This program has been used to optimise the geometry of the copper site in three blue copper proteins, to estimate strain energies, and to calculate reorganisation energies [26].
Figure 1. A comparison of the optimised structure of Cu(Im)z(SCH3)(S(CH3)2)+ [14] and the crystal structure of plastocyanin (shaded) [4].
3. G E O M E T R Y
3.1 The optimal geometry of the blue-copper active site According to the induced-rack and the cntatic state hypotheses [10,12], the Cu(II) coordination sphere in the blue copper proteins is strained into a Cu(I)-likc structure. Such hypotheses arc hard to test experimentally, but with theoretical methods it is quite straightforward. The actual coordination preferences of the copper ion can be determined by optimising the geometry of the ion and its ligands in vacuum; if the optimised structure is almost the same as in the proteins, strain is probably of minor importance for the geometry. We have optimiscd the geometry of Cu(Im)2(SCH3)(S(CH3)2) +, a rcafistic model of the oxidiscd prototypical CuHiszCysMct blue-copper site (e.g. in plastocyanin), using the density functional B3LYP method [14]. The result is sensational but convincing. As can be seen in Figure 1 and Table 1, the optimiscd geometry is virtually identical to the one observed experimentally in the blue copper proteins. Almost all bond lengths and bond angles around the copper ion arc within the range observed in crystal structures, and most of them arc close to the average values for the proteins. Only two small, but significant, differences can be observed: a slightly too long Cu-Scys bond and a slightly too short Cu-SMet bond. These differences can be fully explained by the dynamics of the system, which increase the average Cu-SMet bond length by at least 10 pm [54], and by deficiencies in the theoretical method (the more accurate CASPT2 method, gives a 7 pm shorter Cu-Scy~ bond and a 7 pm longer C-SMet bond [ 14]). Equally convincing results have been obtained for the optimal structure of Cu(Im)z(SCH3)(OCCH3NH2) +, a model of the ligand sphere of oxidiscd stcllacyanin [33], as is also shown in Table 1. It should be noted that no information
Table 1. Comparison of the geometry of optimised models and crystal structures of blue copper proteins [14,34,54]. A x is the axial ligand and tp the angle between the Scys-fu-Ax and N-Cu-N planes. Model
Distance to Cu Scys N Cu(Im)2(SCH3)(S(CH3 )2) +a 218 204 Plastocyanin oxidised 207-221 189-222 Cu(Im)2(SCH3)(S(CH3)2) + 232 214-215 Cu(Im)2(scn3)(s(cn3)2) +b 227 205-210 Plastocyanin reduced 211-217 203-239 Cu(Im)2(SH)(S(CH3)2) +c 223 205-206 Nitrite reductase oxidised 208-223 193-222 Cu(Im)2(SCH3)(OCCH3NH2) +a 217 202-206 Stellacyanin oxidised 211-218 191-206 a Trigonal structure b The C u - S ~ t bond length was constrained to 290 pm. r Tetragonal structure
(pm) Ax 267 278-291 237 290 287-291 242 246--270 224 221-227
N-N 103 96-104 109 119 91-118 100 96-102 103 97-105
Angle subtended at Cu (o) SCys - N Sc~,s-Ax N-Ax 120~ 122 116 94-95 112-144 102-110 85-108 105-108 115 107-113 112-120 99 100-101 ll0-141 99-114 83-110 97-141 103 95-126 98-140 103-109 84-138 122-125 113 92-95 116-141 101-107 87-102
tp 90 77-89 89 88 74-80 62 56-65 88 82-86
from the crystal structures has been used to obtain these structures; they are entirely an effect of the chemical preferences of the copper ion and its four ligands. Thus, the cupric structure in the oxidised blue copper proteins is clearly neither unnatural nor strained. Geometry optimisations of the corresponding reduced models are more complicated, because the lower charge on the copper ion leads to weaker bonds, so that the geometry of the complex is very sensitive to electrostatic interactions with the surrounding protein (e.g. hydrogen bonds) [14]. The optimum vacuum structure of the Cu(Im)2(SCH3)(S(CH3)2) complex is more tetrahedral than the active site in reduced plastocyanin and it has a short CU-SMet bond (237 pm, see Table 1) [14]. However, the potential surface of the Cu-SMet bond is extremely flat (c.f. Figure 10). tf this bond length is fixed at the crystal value (290 pm) and the complex is reoptimised, a structure is obtained that is virtually identical to the crystal structure of reduced plastocyanin. Interestingly, this structure is only 4 kJ/mole less stable than the optimal tetrahedral structure, which is within the error limits of the method [14]. Moreover, many effects not included in these calculations tend to elongate this bond (see below). Thus, we cannot decide whether the reduced structure is slightly distorted by the protein or not, but it is clear that the energy involved is extremely small. We have also studied the structure of azurin [37,61 ]. In this protein there is another weak axial ligand of the copper ion, a backbone carbonyl group. This group is chemically similar to the amide oxygen ligand in stellacyanin. Therefore, it is not surprising that it prefers to bind to copper at a rather short distance, about 230 pm for the Cu(Im)(ImCH2CH2NHCOCH3)(SCH3)(S(CH3)2) § model, similar to the Cu-O distance in stellacyanin. However, it costs less than 6 kJ/mole to move it to the distance found in the crystal structure (around 310 pm, see Figure 2). The same applies to the methionine ligand. It prefers to bind in the second sphere of the complex, but it also has a shallow minimum around 290 pm, which is only 3 kJ/mole higher in energy. The reduced site behaves similarly [61 ].
65
if.j/
~" 4
no s
/
3
\
Cu-O
~
.~
ea~)
Cu-S free
~
1
~
~ X
~
no o
"x.J 0
'
200
'
'
!
'
220
' '
I
'
240
'
'
I
'-'
260
'
I
'
280
'
'
I
'
300
'
'
I
'
320
Bond length (pm)
Figure 2. Potential surfaces for the Cu-O and Cu-SMet bonds in three models of oxidised azurin. "No S" and "no O" refer to the Cu(Im)(ImCH2CH2NHCOCH3)(SCH3)+ and Cu(Im)2(SCH3)(S(CH3)2) + models, respectively, whereas the other two curves were obtained with the Cu(Im)(ImCH2CH2NHCOCH3)(SCH3)(S(CH3)2)+ model with the Cu-O or Cu-SMet bond at its equilibrium distance [37]. Interestingly, if the copper ion in azurin is replaced by Co(II), things change drastically, as can be seen in Figure 3. If one of the axial ligands is removed, the other binds strongly to the metal ion, with a force constant of the same size as for the equatorial ligands and about six times larger than those of the Cu(II) complex. The optimum C o O and Co--SMet distances are 207 and 243 pm, respectively. The potential surface for the carbonyl ligand is not changed much if the methionine ligand is added; cobalt still prefers a short Co-O bond by about 30 M/mole over the distance found in the copper protein, which explains the short distance in the crystal structure (--220 pm) [62]. However, if the Co-O distance is fixed at the crystal value, the Co-SMe t potential change strongly: The methionine model prefers to bind in the second sphere, but the surface becomes extremely fiat, so fiat that the Co--SMet bond length can be varied between 280 and 380 pm at a cost of less than 1 M/mole. In the crystal structure it is 340-370 prrt Thus, the cobalt site is four-coordinate with a strong bond to the carbonyl group, whereas the copper site is effectively three-coordinate. The bonds to the other axial ligands are determined by interactions with the protein, rather than with the metal. The only way to study such weak bonds in vacuum models is by studying the potential surfaces, such as those in Figures 2 and 3.
3.2. Trigonai and tetragonal Cu(ll) structures Why does a Cu(II) ion with the ligands in the blue copper proteins assume a trigonal structure, whereas most inorganic cupric complexes are tetragonal (square planar, square pyramidal, or distorted octahedral) [63,64]? We have faced this question by optimising the geometry of a number of models of the type
50 2
Co-O
/
.~ 40-
co-
CoS .
s//
30-
s nx
20-
///
lo0
~ -r.-
200
|
I
i-,.
250
I
|
~ I
% ~ o ~ x ~ i
I
300 350 Bond length (pm)
|
i
|
|
400
Figure 3. Potential surfaces for the Co-O and Co-SMet bonds in three models of Co(II)substituted azurin [37]. "No S" and "no O" refer to the Co(Im)(ImCHzCH2NHCOCH3)(SCH3)+ and Co(Im)2(SCH3)(S(CH3)2)+ models, respectively, whereas the other two curves were obtained with the Co(Im)(ImCH2CH2NHCOCH3)(SCH3)(S(CH3)2)+ model with the Co-O or Co-SMet bonds either at their equilibrium distance or f'rxed at the crystal values, 223 or 356 pm, respectively. where X is OH-, SH-, Sell-, CI-, NH2-, and some other ligands related to the cysteine thiolate group [65]. The results show that all complexes may assume two types of structures, both reflecting the Jahn-Teller instability of the tetrahedral Cu(II) complex (a d 9 ion). This instability can be rifted either by a Ozd distortion, leading to a tetragonal structure, or by a Czv distortion, leading to a trigonal structure. The tetragonal structure is stabilised by four favourable t~ interactions between the singly occupied Cu 3d orbital and p~ orbitals on the four ligands, as is shown in Figure 4. This gives rise to the well-known square-planar Cu(II) complexes. If one of the ligands instead has the ability to form a strong rc bond with the copper ion, however, a trigonal structure can be stabilised. The fight-hand side of Figure 4 shows that in such a structure, two of the ligands still form t~ bonds to the copper ion, whereas a px orbital of the third ligand overlaps with two lobes of the singly occupied Cu 3d orbital. Thereby it occupies two positions in a square coordination plane, giving rise to a trigonal planar geometry. The fourth ligand cannot overlap with the singly occupied orbital and has to interact with a doubly occupied orbital. Since such an interaction is weaker, it becomes an axial ligand with an enlarged copper distance. Thus, the long axial bond is a result of the electronic structure. Moreover, the effective coordination number is decreased, so the three strong ligands in the trigonal complex bind at shorter distances to the copper ion than in the tetragonal complex. This explains the short Cu-Scys bond in the blue copper proteins, together with the fact that a rt bond to copper is inherently shorter than a corresponding ~ bond [34,39]. CuII(NH3)3 X,
10
Figure 4. The singly occupied orbitals of the tetragonal (left) and trigonal (fight) Cu(NH3)3(SH)§ complex [38]. For small and hard X ligands, such as NH 3 and OH-, the tetragonal CulI(NH3)3 x structure is most stable (by 30-70 U/mole) [65]. For large, soft, and polarisable ligands, such as SH- and Sell-, on the other hand, the two types of structures have approximately the same stability (within 15 U/mole). Interestingly, the tetragonal structure is most stable for Cu(NH3)3(SH) +, whereas the trigonal structure is more stable for Cu(NH3)2(SH)(SH2) +, showing that the methionine ligand is also important for the structure of the blue copper proteins. This explains why very few trigonal cupric structures have been observed for small inorganic models; there simply is no complex with the appropriate set of ligands, CuN2S-S ~ [63,66,67]. Formally, the Cu(NH3)3X complexes consist of a Cu(II) ion with nine d electrons and a neutral or negatively charged X ligand. For three soft and negatively charged ligands, Sell-, NH2-, and PH2-, however, the charge on the ligand moves to copper ion, yielding a Cu(I) ion and an uncharged ligand radical. Since the Cu(I) ion has a full d shell, it prefers a tetrahedral structure and the complexes with these three ligands are almost tetrahedral. The thiolate ligand is intermediate: the electron is delocalised between the copper and thiolate ions. Therefore, both the trigonal and the tetragonal structures are rather tetrahedral and are actually quite similar (c.f. Figure 4). Naturally, this facilitates electron transfer to and from the complex by reducing the reorganisation energy. A characteristic difference between the two geometries of Cu(NH3)2(SH)(SH2) + is that the tetragonal structure has a longer Cu-Scy~ bond and a shorter CU-SMe t bond than the trigonal structure. Moreover, the tp angle (the angle between the S-Cu-S and N - C u - N planes) is smaller in the tetragonal structure and the two largest angles around the copper ion in the trigonal structure are between Scys and the two NHis atoms, whereas in the tetragonal structure the two largest angles are between two distinct pairs of atoms (Scys---fu-N and SMet---fu-N).
11 These structural differences remind of the differences in the crystal structure of plastocyanin and nitrite reductase (c.f. Table 1). Detailed investigations have shown that this is not accidental: Plastocyanin, and more generally, axial type 1 copper proteins have a trigonal structure with a rc bond between copper and the thiolate ligand, whereas the so-called rhombic type 1 copper proteins (e.g. nitrite reductase) have a tetragonal structure with mainly o bonds to the copper ion (c.f. Figures 6 and 7). This gives an attractive explanation to the structural and spectroscopic differences between these proteins, which share the same copper ligand sphere [34,65]. The two types of structures have almost the same energy (within 7 kJ/mole) and which structure is most stable depends on the models used for the ligands. At present it is not possible to decide if the native structure of the typical blue-copper ligand sphere is trigonal or tetragonal [26,34,54,65]. Thus, with the typical blue-copper ligands, the tetragonal Jahn-Teller distortion may at worst give rise to the structure found in nitrite reductase, i.e. a fully functional site with properties (reduction potentials and reorganisation energies) similar to those of the trigonal blue copper proteins [35,68]. This shows that with the blue-copper ligands there is no need for protein strain to avoid a tetragonal structure. By free energy perturbations, we have studied why some proteins stabilise the trigonal structure, whereas others stabilise the tetragonal structure, although the ligand sphere is the same [54]. The results indicate that plastocyanin prefers the bond lengths and electrostatics of the trigonal structure, whereas nitrite reductase favours the angles in the tetragonal structure, both by 10-20 U/mole. Interestingly, the length of the CU-SMet bond has a very small influence on the relative stability of the two conformations. The existence of trigonal and tetragonal structures seems to be general for copper-cysteine complexes. An illustrative example is the geometry of the catalytic metal ion in copper-substituted alcohol dehydrogenase (the native enzyme contains zinc). In this enzyme, the copper ion is coordinated by two cysteines, one histidine, and a ligand from the solution. A crystal structure with dimethylsulfoxide as the fourth ligand shows a trigonal structure with dimethylsulfoxide as an axial ligand at a large distance (319-345 pm) [69]. Our calculations [39] show that the electronic structure of such complexes is very similar to the traditional blue copper proteins. The NHis atom and one of the Scys atoms make t~ bonds to the copper ion, whereas the other Scys atom forms a rc bond with copper. Thus, the two cysteine ligands are not equivalent; the rc bonded ligand has a shorter CuScys bond and larger angles to the other ligands compared to the t~ bonded ligand (217 and 225 pm, respectively). Tetragonal structures may also be obtained for models of Cu-alcohol dehydrogenase [39]. When the fourth ligand is uncharged, they are less stable than the corresponding trigonal complexes (in accordance with the crystal structure). However, with OH- (which is involved in the reaction mechanism of the enzyme)
12
Table 2. The effect of the model size, basis set, and density functional method on the geometry of the trigonal models of the oxidised blue-copper site [26]. Method Modela Basis Distance to Cu (pm) Angle around Cu (~ q) setb Scy~ N SMet N-N Scy~N S-S SMet-N (o) B3LYP 1 1 218 204-205 267 103 119-122 117 94-95 89.1 B3LYP 1 2 219 206 269 104 119-122 117 94-96 88.5 B3LYP 1 3 218 205 267 103 118-123 117 94-96 88.3 B3LYP 2 1 218 204-206 271 102 120-125 115 94-95 89.7 BP86 1 1 219 203-210 236 103 102-130 116 99-105 81.1 aThe models are 1, Cu(Im)2(SCH3)(S(CH3)2) + or 2, Cu(ImCH3)2(SC2Hs)(S(CH3)(C2Hs) +. bThe basis sets are 1, DZpdf/6-3 IG*; 2, TZVPP; or 3, DZs2pd2f/6-31 l(+)G(2d,2p) [26]. as the fourth ligand, the tetragonal structure is most stable. This may explain some of the spectral shifts that are observed experimentally when the coenzyme or the ligands are exchanged [69].
3.3 The sensitivity of the geometries to the theoretical method When we did the geometry optimisations of the plastocyanin models six years ago, they were on the verge of the possible; each optimisation took three cPumonths. Today, such calculations can be done routinely in less than a week on a standard workstation. Therefore, we now have the opportunity to test whether these calculations were converged with respect to the basis sets or model systems. In Table 2, we list the result of a series of such calculations for the oxidised trigonal plastocyanin model, Cu(Im)2(SCH3)(S(CH3)2) § [26]. Clearly, the results are very stable. If the basis set is enhanced to triple-~, with diffuse functions on S and N, and double polarising functions on all atoms (DZs2pd2f/6-31 l(+)G(2d, 2p) [25]) or is changed to the TZVPP basis set (with a d function on H and an f function on other atoms) [70], the bond lengths to copper change by less than 2 pm and the angles change by less than 1o. Similarly, if a methyl group is added to all ligands (Cu(ImCH3)2(SC2Hs)(C2HsSCH3)+), only small changes are observed, up to 4 pm in the bond length and 3 ~ in the angles. It would have been interesting to check if the results also are converged with respect to the method. Unfortunately, there is no method that is clearly better and can be used for models of this size. It is still not possible to do analytical geometry optimisations with the CASPT2 method and test calculations with the MP2 method indicate that these results cannot be converged with respect to the basis set [26]. Therefore, we have only made calculations with another common density functional n~thod, Becke-Perdew86 (BP86) [71,72]. In general performance, it slightly less accurate than B3LYP [22], but it has the advantage of lacking exact exchange, which may in combination with other techniques make the calculations about five times faster than B3LYP [73]. However, as can be seen in Table 2, the change in the density functional, leads to quite appreciable changes in the geometry, up to 10 ~ in the angles and as much as 35 pm for the Cu-SMet
13 Table 3. The effect of the dielectric constant (e) on the geometry of the Cu(Im)2(SCH3)(S(CH3)2) complex [26]. The calculations were performed with the B3LYP method, the DZpdf/6-31 G* basis set, and the CPCM solvation model with a water probe. e Distance to Cu (pm) Angle around Cu (o) tp .... S%s N SMet N-N Scy~N S-S SMet-N (o) 1 232 214-215 237 109 105-108 115 107-113 89.4 2 233 209-211 240 116 106-108 106 103-115 87.0 4 234 209-211 241 117 107-109 105 103-114 86.8 8 233 209-210 244 118 108-112 106 100-112 86.7 16 235 208-208 246 120 108-110 104 103-110 87.6 80 232 208-211 248 112 108-121 104 98-110 87.4 bond length. Even if these changes are not so large in energy terms, the structure is appreciably less similar to the experimental structures and therefore we cannot recommend this method for general use. Similar results apply for the reduced models, but due to the weaker interaction with copper, the changes are slightly larger, up to 6 pm and 7 ~ [26]. Most importantly, however, the relative energy between the optimal geometry and the complex with the Cu-SMet bond length fixed at 290 pm is converged; it changes by less than 1 U/mole if the basis set or model system is increased. With the BeckePerdew86 method, bond lengths change by up to 9 pm, angles by less than 4 ~ but the relative energy change by 6 U/mole [26]. Our study of the reduction potential of the blue copper proteins indicated that the geometries may change quite appreciably when solvation effects are taken into account [35]. We have therefore performed a number of geometry optimisations of the reduced Cu(Im)2(SCH3)(S(CH3)2)complex in a solvent with varying values of the dielectric constant [26]. The results in Table 3 show that the bond lengths change by up to 11 pm and the angles by up to 11 o. Thus, the geometry change in the solvent, but the effects are not very large, and the general geometry is not changed (the tp angle does not decrease below 86~ In particular, the effect of the solvent is smaller than the results for the reduction potential indicate. It is notable that the length of the CU-SMe t bond increases with the dielectric constant. Thus, solvent effects may explain parts of the difference between the optimised and crystal structures for the reduced complexes. Even if the CU-SMe t bond does not become longer than 248 pm, this will decrease the energy difference between the optimised and the crystal structure (i.e. below 4 U/mole). Furthermore, increasing the basis set [26], the model size [26], or improving the theoretical method [14] also elongate the CU-SMe t bond, as do dynamic effects [54]. If all these corrections are added together, the CU-SMe t bond length should become --270 pm, but non-additive effects may make it even longer. Therefore, it is not clear if there is any discrepancy at all between the calculated and experimental length of this bond, but if there is any, it is very small in energy terms.
14
Figure 5. A comparison between the crystal structure of reduced plastocyanin (light grey and no hydrogen atoms) [74] and the structures of Cu(Im)2(SCH3)(S(CH3)2)optimised in vacuum (dark grey) [14] or with COMQUMin reduced plastocyanin [26].
3.4 Geometry optimisations in the protein The best way to study the geometry of the blue copper proteins is to perform geometry optimisations in the protein using combined quantum chemical and molecular mechanical methods. We have recently initiated a series of such calculations using the program COMQUM [57], which uses the B3LYP method for the active site and the Amber force field [40] for the rest of the protein [26]. Some of the results of these calculations are shown in Table 4. It is clear that the COMQUM structures are appreciably more similar to crystal structures than structures optimised in vacuum. This is most obvious for the orientation of the histidine tings and the dihedrals of the other copper ligands, as can be seen in Figure 5. This improvement is quite natural since these low-energy modes are determined in vacuum by weak hydrogen bonds involving the methyl groups. In the protein, they are instead determined by interactions with the surrounding protein, e.g. steric effects, normal hydrogen bonds, and non-polar interactions. However, there is also a significant improvement of the Cu-ligand distances and the angles around the copper ion, as can be seen in Table 4. In all COMQUM structures, the Cu-Scy~ bond is appreciably shorter than in vacuum, which make them more similar to what is found in crystal structures (they still are a few pm too long, which reflect the tendency of B3LYP to give too long Cu-Scys bonds [14]). This is probably an effect of the N H S c y s hydrogen bond in the protein. Similarly, the Cu-N distances are 3-10 pm shorter than in the vacuum structures, again improving the agreement with crystal structures. The S-Cu-SMet, SMet-CuN, and r angles are also significantly improved, especially for the oxidised systems.
15 Table 4. The result of COMQUM calculations on plastocyanin, nitrite reductase, and cucumber basic protein, using the quantum system Cu(Im)2(SCH3)(S(CH3)2) § [26]. Systema Distance to Cu (pm) Angle around Cu (o) tp Cu Protein Con Sc~ N SMe t N-N Scr~-N S-S SMet-N (o) I Vacuum 232 214-215 237 109 105-108 115 107-113 89 Pc red Yes 221 203-208 339 103 120-134 104 78-101 76 No 222 203-212 375 103 120-136 105 72-103 71 Crystal 211-217 203-239 287-291 91-118 110-141 99-114 83-110 74-80 II Vacuum 218 204 267 103 120-122 116 94-95 90 Pc ox 214 Yes 197-198 290 103 123-125 105 86-106 78 Crystal 207-221 189-222 278-291 96-104 112-144 102-110 85-108 77-89 IIb Vacuum 223 205-206 242 100 97-141 103 95-126 62 Nir Yes 219 200-203 262 100 104-135 105 87-129 61 No 224 203-205 241 97 91-146 105 89-141 48 Crystal 208-223 193-222 246-270 96-102 98-140 103-109 84-138 56-65 IIc CBP Yes 217 199-210 273 100 118-128 104 84-118 69 Crystal 216 193-195 261 99 110-138 111 83-112 70 a The system is defined by the oxidation state of the copper ion (Cu), the protein (Pc red, reduced plastocyanin; Pc ox, oxidised plastocyanin; CBP, cucumber basic protein; Nir, nitrite reductase; Vacuum, quantum chemical optimisation in vacuum [14,34]; Crystal, range observed in the available crystal structures in the Brookhaven protein data bank), and whether there is a connection between the metal ligands and the protein backbone (Con). b This is the tetragonal structure, which in vacuum is obtained with the Cu(Im)2(SH)(S(CH3)2) + model [34,65]. This is a structure intermediate between trigonal and tetragonal that has not been observed in vacuum.
For the C u - S M e t bond length, the results are less clear. In all cases, the bond is elongated, and for the oxidised structures, it is in excellent agreement with experimental structures. However, for reduced plastocyanin, the Cu-SMet bond becomes too long, 339 pm, compared to -290 pm. This is most likely due to the flexibility of the bond, combined with problems in the classical force field. Apparently, the molecular mechanics part of the calculations is not accurate enough to describe the fine-tuned interplay between methionine group, the copper ion, and the surrounding enzyme. Nitrite reductase and cucumber basic protein were also included in the investigation to see if the protein could stabilise tetragonal and intermediate structures, although such structures cannot be found in vacuum with the quantum system used (Cu(Im)2(SCH3)(S(CH3)2)+). The results in Table 4 show that this is actually the case. The COMQUM structures give tp angles of 61 o and 69 ~ respectively, which is close to the experimental values and clearly show that the nitrite reductase is tetragonal, whereas the cucumber basic protein structure is intermediate. The large SMet---fu-N and Scys---fu-N a n g l e s also flag that the structures are not trigonal. At first, these improved structures could be taken as evidence for protein strain. However, the COMQUM calculations involve effects that are normally not considered as strain, e.g. the change in the dielectric surrounding of the copper
16
site, electrostatic interactions, and hydrogen bonds. In order to distinguish between these effects, we performed two calculations in which the covalent bond between the backbone and the side chain of the metal ligands is removed (c.f. Table 4) [26]. This way, covalent strain effects from the protein are eliminated. Interestingly, this hardly changed the structure of the plastocyanin site at all, except for an elongated CU-SM~t bond. However, the nitrite reductase structure became more similar to the vacuum structure, except for a larger variation in the SMet---fu-N angles and a smaller tp angle (even smaller than in crystal structures). Thus, the nitrite reductase structure seems to be tuned by covalent interactions, whereas the plastocyanin site is modified by electrostatic interactions. This is in excellent agreement with our free energy calculations of the two proteins [54], which indicated that plastocyanin preferred the trigonal structure by electrostatic interactions, whereas nitrite reductase favoured the angles in the tetragonal structure. It should be noted, however, that already the vacuum structures reproduce most of the features of the copper coordination. Protein interactions are used only to fine-tune the structures at a small cost in energy. Actually, the COMQUM calculations give us an opportunity to directly estimate strain energies in the proteins. The strain energy is given by the difference in energy of the isolated quantum system at the COMQUM geometry and at the optimal vacuum geometry. This energy ranges from 33 to 51 kJ/mole. This is similar to what was found for the catalytic and structural zinc ions in alcohol dehydrogenase, 30-60 kJ/mole [57,75-77], which seems to be a normal strain energy for the incorporation of a metal site from vacuum into a protein. If the connection between the protein and the metal ligands is removed, the strain energy is approximately halved. The difference (21-23 kJ/mole) is close to the strain energy in the sense of Warshel [50] (see Section 8) and also in the common mechanical sense (a distortion of the structure caused by covalent interactions). This energy is actually appreciably lower than what was found for alcohol dehydrogenase (33 kJ/mole) [76]. Yet, even these energies involve some terms that are not commonly regarded as strain. In the vacuum structure there are hydrogen bonds between the methyl groups and the negatively charged Scy~ atom. These are removed in the COMQUM structure, but the more favourable interactions in the protein are not included when the strain energies are calculated. This gives a significant positive contribution to the strain energy. Therefore, it is not surprising that the strain energies are still not negligible, but it is clear that the COMQUM calculations give no evidence of any unusual strain energies for the blue copper proteins.
17 4. E L E C T R O N I C SPECTRA The hallmark of cupredoxins, leading to their description as blue or type 1 copper proteins, is the presence in their electronic spectrum of an intense (e = 3 00(06 000 M-~crn-~) absorption band around 600 nm. This spectral feature distinguishes them from normal inorganic Cu(II) complexes, the spectrum of which only contains a number of weaker (e = 100 M-~cm-~) ligand-field transitions in the same region [78]. However, also within the type 1 proteins, variations exist. In addition to the prominent peak at 600 mat, a feature at 450 nm is observed in all spectra with a varying intensity [79,80]. The axial type 1 proteins, like plastocyanin and azurin, show only tittle absorption in the 450-nm region, whereas this band becomes much more prominent in rhombic type 1 proteins, like pseudoazufin, cucumber basic protein, and stellacyanin. The increasing intensity of the 450nm band in the latter proteins goes together with a decrease in intensity of the 600-nm band, so the sum of e460 and e600 is approximately constant [79]. Nitrite reductase from Achromobacter cycloclastes is a limiting case for which the 460nm line is actually more intense than the 600-nm peak, giving the enzyme a green colour. No natural proteins exist in which the blue band is even further reduced, but by site-directed mutagenesis a number of mutants have been constructed in which only the second band is present, blue-shifted towards 410 ran, giving them a yellow to orange colour [81]. Based on the analogy of their EPR spectra with the normal type 2 copper proteins, these mutants have been classified as type 2 [82]. The classification of mutants with intermediate spectroscopic characteristics as type 1.5 follows naturally. Apart from these two peaks, several weaker features have been discerned in the visual and near-infrared region of the spectra of type 1 copper proteins. Based on different types of spectroscopic analyses and with the help of the density functional X~ calculations, Solomon and coworkers [83,84] have reported and assigned a total of nine absorption bands in the spectrum of plastocyanin (c.f. Table 5). They assigned the 600-nm (11 700 crn-~) band to a charge transfer excitation from a Scys p orbital with rt overlap to a Cu orbital. The band at 460 nm (21 370 cm -~) was proposed to correspond to a His---)Cu charge-transfer, whereas an additional feature at 535 nm (18 700 cm -~) was assigned to a charge transfer from the so-called Scy~ pseudo-o orbital. Similar studies have more recently been performed on nitrite reductase [85], cucumber basic protein, and stellacyanin [86]. Below, we describe our spectroscopic studies of the blue copper proteins with the more accurate CASPT2 method, leading to a unified theory for the spectra of copper-cysteinate proteins.
18
4.1 The electronic spectrum of plastocyanin We have studied the electronic spectrum of plastocyanin with the CASSCF/CASPT2 approach [36]. The blue copper site in this protein is not symmetric. However, the N - C u - N and S-Cu-S planes are approximately perpendicular, so the geometry can be changed to G symmetry with modest movements. Such a symr~trisation simplifies the labelling of the excited states and speeds up the calculations, so that larger models and more excited states can be studied. However, our most reliable results were obtained for an unsymmetrical Cu(Im)2(SH)(SH2) + model (for which we can include a point-charge model of the surrounding protein and solvent), corrected for the truncated cysteine and methionine models [36]. A total of nine states have been studied, including the five ligand-field states and the four lowest ligand-to-metal charge-transfer states. The results are shown in Table 5 together with experimental excitation energies. The various excited states can best be characterised by analysing the singly occupied molecular orbital of each state. These orbitals are shown in Figure 6. The singly occupied orbital for the X 2A~'ground state is strongly delocalised over the Cu-Scy~ bond. It involves arc antibonding interaction between the Cu 3dxy and Scy~ 3py orbitals, combined with a much weaker ~ antibonding interaction with the two N ligands, whose positions in the equatorial plane are such that a perfect overlap with the two remaining lobes of the Cu 3dxy orbital is obtained (the coordinate system is selected so that the copper ion is in the origin, SMut is on the z axis, and Scys is in the xz plane). The singly occupied orbital of the first excited state (a 2/~) is formed by a t~ antibonding combination of the Cu 3dxz_y2 and Scys 3p~ orbitals. This interaction is also strongly covalent. The calculated excitation energy for this state is 4 119 cm -l, which explains the appearance of the band at 5 000 crn-~ in the plastocyanin spectrum. Between 10 000 and 14 000 cm -l, three bands are found in the experimental spectrum, corresponding to the calculated states b 2A~,b 2A~', and c 2A~.From the composition of the corresponding singly occupied orbitals it is clear that the states concerned can be labelled as genuine ligand-field states, with the electron hole localised in the Cu 3dz2, 3dyz, and 3dxz orbitals, respectively. The presence of a definite amount of Scy~ 3p~ character in the Cu 3dyz orbital of the b 2~r state is notable. This mixing gives a significant intensity for the transition to this state, which is in fact responsible for the second most intense band in the plastocyanin spectrum. The dominant blue band, appearing at 16 700 cm -~ in the experimental spectrum, was calculated at 17 571 cm -~ and corresponds to the c 2A~'state. As can be seen from Figure 6, the corresponding singly occupied orbital is the bonding counterpart to the Cu-Scy~ rt antibonding ground-state orbital. The extremely good overlap between the two orbitals immediately explains the large absorption intensity of the corresponding excitation. Even if this transition formally can be
19
Figure 6. The singly occupied orbitals of the various excited states in the symmetric Cu(Im)2(SCH3)(S(CH3)2) § model, calculated at the CASSCF level [36]. labelled as a Scys---)Cu charge-transfer excitation, the actual amount of charge transferred is only about 0.2 e. At higher energies, four additional charge-transfer bands were observed in the experimental spectrum, at 18 700, 21 390, 23 440, and 32 500 crn-~, respectively [84]. The latter two bands were assigned by Gewirth and Solomon [84] as charge-transfer excitations from methionine and histidine, respectively. We could only study these excitations in models with enforced symmetry. The results are therefore more approximate than for the lower excitations, but they are in line with Gewirth's assignments. However, for the bands at 18 700 and 21 390 cm -~, there is a discrepancy between the experimental spectrum and our calculations. Indeed, we predict only
20
Table 5. The experimental [84,85] and calculated [34,36] spectrum of plastocyanin and nitrite reductase (excitation energies in cm -1, oscillator strengths in brackets) together with the assignment of the various excitations. The ground-state singly occupied orbital is Cu-Scys rt* in plastocyanin but Cu-Sc~s t~* for nitrite reductase. State Plastocyanin Nitrite reductase Assignment Calculated Experimental Experimental Calculated Assignment a 2A Cu-Sc. o* 4 119 (0.000) 5 000 (0.000) 5 600 (0.000) 4 408 (0.000) Cu-Sc~ n:* b zA 3dz2 10 974 (0.000) 10 800 (0.003) 11 900 (0.003) 12 329 (0.000) 3dz2 b~' 3d,,z 13 117(0.001) 12800(0.011) 13500(0.009) 12872(0.000) 3dyz c 2t~ 3dxz 13 493 (0.000) 13 950 (0.004) 14 900 (0.010) 13 873 (0.003) 3dxz c 2A'' Cu-Scy, n: 17 571 (0.103) 16 700 (0.050) 17 550 (0.020) 15 789 (0.032) Cu-Scy~X d2A Cu-Sc~o 20599 (0.001) 21 390(0.005) 21 900(0.030) 22461 (0.119) Cu-Sc~o
one excited state in this region of the spectrum, d 2A~.The singly occupied orbital in this state is the ~ bonding combination of Cu 3dx2_y2 and S 3px, corresponding to the antibonding orbital of the first excited a 2g state. This is Gewirth and Solomon's pseudo-o orbital [84]. They assign the band at 18 700 cm -1 in the experimental spectrum as the excitation to the d 2A~state and the band at 21 390 cm -1 as another His---)Cu charge-transfer excitation. Our calculated excitation energy for the d2A' state is 20 599 crn-~, between the experimental bands at 18 700 and 21 390 crn-~, but closer to the latter. Still, our assignment of the 21 390-crn-~ band as the transition to d 2A~comes mainly from an analysis of the Scys po---)Cu transition in other proteins and as a function of the tp angle (see the next section). According to this analysis, the transition energy should remain constant for the various proteins. Therefore, it seems unlikely that the d ZA~ state would appear more than 3 000 crn-~ lower in energy in plastocyanin (18 700 cm -1) than in nitrite reductase (21 900 crn-~ [85]). Moreover, the intensity of the Scys po---)Cu transition should increase significantly with a decreasing tp angle, which is in accordance with the increasing intensity of the 460-nm peak (21 700 crn-~) in the experimental spectra of the rhombic type 1 proteins. In addition, experimental evidence indicate that Scys, rather than imidazole, is involved in this band [79,87]. Therefore, it is more plausible to assign the d 2g state to the band at 21 390 cm -1, although this means that we have to leave the 18 700-cm -1 band unassigned. It is notable that the latter band is not present in the experimental spectrum of nitrite reductase [85].
4.2 Correlation between structure and spectroscopy of copper proteins On the basis of the electronic, resonance Raman, and EPR spectra, the cysteine-containing copper proteins have been divided into four groups: axial type 1 (e.g. plastocyanin), rhombic type 1 (e.g. nitrite reductase and stellacyanin), type 1.5, and type 2 (mutant) copper proteins [81]. We have studied the spectra of members of each group with the CASPT2 method [33,34,36,38].
21
Figure 7. The singly occupied ground-state orbitals for four models of rhombic type 1 copper oroteins, calculated at the CASSCF level [341. In Figure 7, the ground-state singly occupied orbitals of three rhombic type 1 proteins, viz. cucumber basic protein (plantacyanin), pseudoazurin, and nitrite reductase, are shown. If these are compared with the ground-state orbital of plastocyanin in Figure 6, a clear difference can be seen. In plastocyanin, there is an almost pure ~* interaction between Cu and Scys. However, in nitrite reductase, this interaction is instead mainly of o* character, and the other two proteins show a mixture of o* and ~* interactions. Thus, there has been a change in the ground state of the system; for plastocyanin the Cu-Scys o* interaction is found in the first excited state, whereas in the rhombic proteins, a significant contribution of o character is found in the ground-state singly occupied orbital. The singly occupied orbitals in the other excited states are not much changed. This directly explains the change in intensity pattern of the spectrum. The re* antibonding orbital has a strong overlap with the corresponding rc bonding orbital in the c 2A~' state,
22
giving rise to the blue line in the spectrum, whereas the t~* antibonding orbital instead overlaps strongly with the corresponding t~ bonding orbital, found in the d2A~ state. As expected, this transition gives rise to the yellow band around 460 nm in the spectrum, the line that increase in intensity for the rhombic type 1 copper proteins. In Table 5, the calculated and experimental excitation energies and oscillator strengths of plastocyanin and nitrite reductase are compared [34]. It can be seen that the error in the calculations is consistently below 1 800 crn-~, i.e. within the error limits of the CASPT2 method [28]. Moreover, the calculations follow the experimental trend, i.e. that all excitations for nitrite reductase appear at a higher energy than the corresponding excitations for plastocyanin [85]. This reflects the stronger ligand-field exerted in the tetragonal structure, with four instead of three strongly bound ligands. The intensity of the ligand-field states also reflects the change in ground state" the intensity of the d2A~' state has dropped to zero, whereas the b,c 2A~ states gain intensity from the presence of a small amount of Scys 3pa character in the corresponding singly occupied orbitals. The ground-state orbitals and the spectra of the other two proteins, cucumber basic protein and pseudoazurin are intermediate between those of plastocyanin and nitrite reductase. Similarly, their structures, as described by the angle cp between the S--Cu-S and N - C u - N planes, are also intermediate (rp is 82, 74, 70, and 61 o for plastocyanin, pseudoazurin, cucumber basic protein, and nitrite reductase, respectively). Thus, there seems to be a correlation between the spectrum and the flattening of the copper geometry. This was investigated thoroughly for the Cu(NH3)2(SH)(SH2) + model by calculating the spectrum at a number of cp angles, ranging from the ideal trigonal structure (rp = 90 ~ to a strictly squareplanar structure (r 0 ~ [34]. The results are summarised in Figure 8, which shows how the CU-SM~t bond is shortened and the CU-SMe t bond is elongated as the structure goes from trigonal to square planar. At the same time, the ratio of the calculated oscillator strengths for the excitations around 460 and 600 nm goes from zero to infinity. This reflects that in the trigonal structure, the c2A~' state gives rise to the dominant blue band, whereas the d 2A~state has little intensity. In the square-planar structure, the situation is reversed. The ground state is of CuScys ~* character, and the Cu-Scy~ t~---~* excitation, has by far become the most intense, whereas the c 2g, state has almost completely vanished. Even if the calculations were performed on a simple model, the results presented in Figure 8 nicely reflect the structure-electronic spectroscopy relationship between the various types of copper-cysteinate proteins. The copper coordination geometry of axial type 1 proteins is close to trigonal, and their spectroscopic characteristics are reflected by the results obtained for tp > 80 ~ Rhombic type 1 proteins like pseudoazurin and cucumber basic protein, on the other hand, have rp angles between 70 ~ and 80 ~ As can be seen from Figure 8, even at such a small
23
290-
-25
,o
//
ntensity
280" 270
/
260
15
250
~' ..
240 230 220 210
CU_Scys '
'
'
I
20
'
'
'
I
40
'
'
'
I
60
w
,
,
1
'
'
80
Twisting angle (o)
Figure 8. The variation of the Cu-Scys and C u - S M e t bond lengths and the quotient of the oscillator strengths of the peaks around 460 and 600 nm as a function of the r twisting angle [34]. deviation from orthogonality, the 460-nm excitation has already gained significant intensity due to mixing of ~ character into the ground-state singly occupied orbital. The largest deviation from orthogonality within the type 1 copper proteins is found for nitrite reductase from Achromobacter cycloclastes which has r = 5665 ~ [88]. At such angles, the second transition has become the most intense, which is in accordance with the green colour of nitrite reductase. The intensity of the blue band further decreases as the structure is more flattened, and the results obtained for the smallest r angles in Figure 8 can to a first approximation be used to mimic the properties of type 2 copper-cysteinate (mutant) proteins, with their yellow colour. We have also performed calculations on more realistic complexes [34,38] which confirm these predictions. They show that the Cu-Scys ~--->~* excitation is blue-shifted in these models by more than 1 000 c m - 1 , in agreement with the experimental shift of this band from 460 to 410 nm when going from type 1 to type 2 copper proteins [81 ]. The result in Figure 8 has led us to suggest that axial type 1 proteins have a trigonal structure with a rc bond between Cu and Scys. The other three types of copper proteins have instead a tetragonal structure with mainly o bonds to all the four copper ligands. They differ in the flattening of the geometry, for example as described by the tp angle. Rhombic type 1 proteins, which are most distorted towards a tetrahedron, arise when one of the ligands forms a weak bond. If all ligands bind strongly, but still are rather soft (e.g. histidine), type 1.5 sites arise, whereas with harder ligands (e.g. water) and preferably with two axial ligands, the strongly flattened type 2 copper sites are found. It is notable that all sites form naturally, following the preferences of the copper ion and its ligands, and not by protein strain.
24 The only protein that does not fit into the above description is stellacyanin. The structure of this protein is clearly trigonal, with a r angle of 84 ~ similar to plastocyanin. However, the e46o/e600 ratio for stellacyanin is significantly higher than for plastocyanin and its ESR characteristics are rhombic instead of axial. The structure and electronic characteristics of stellacyanin were recently discussed in two independent studies by Solomon et al. and us, and quite different interpretations were given [33,86]. In Solomon's view, the stronger axial ligand in stellacyanin (a glutamine side-chain amide group, which binds closer to the copper ion than methionine, -220 pm compared to 265-330 pm) should induce a stronger Jahn-Teller driving force. The fact that the copper surrounding in stellacyanin is not more strongly tetragonally distorted than in plastocyanin can in this view only be explained by more protein strain. However, our calculations on the Cu(Im)z(SCH3)(OCCH3NH2) + model show that its optimal geometry is trigonal and close to the crystal structure of cucumber stellacyanin (c.f. Table 1) [33]. There is no need for strain, since the Jahn-Teller instability can be lifted also by a trigonal distortion instead. As concerns the spectral characteristics, Solomon makes a clear distinction between stellacyanin and the other rhombic type 1 proteins, in that he gives a different assignment to the intensity-gaining band around 460 nm: a His--)Cu charge-transfer excitation in stellacyanin, as opposed to a Scys pseudo-o---)Cu excitation in the other rhombic proteins. As already noted, the His--)Cu band was not reproduced by our calculations. However, our results indicate that the excitation out of the Scys o orbital around 22 0(O cm -1 becomes significantly more intense in stellacyanin, at the expense of the blue band, in conformity with what was found for the other rhombic type 1 proteins. The intensity-gaining mechanism in stellacyanin is not a decreasing tp angle, but the stronger axial interaction with the glutamine side-chain amide group, giving rise to a more pronounced mixing in of o character into the ground-state singly occupied orbital, even in an almost strictly trigonal structure (see Figure 7d) [33]. Therefore, there is no need to invoke a H i s ~ C u excitation to explain the increased e460/e600ratio in stellacyanin. 4.3 The sensitivity of the calculated spectra on the theoretical method Our studies of the spectra of blue copper proteins have taught us a lot about spectra calculations on metal complexes and their sensitivity to various parameters. First, the size of the ligand models is crucial. Imidazolc should be used as a model of histidine, SCH3- for cystcine, S(CH3)2 for methioninc, and CH3CONH2 for glutamine [33,36]. If imidazole is replaced by NH3, most excitation energies decrease by 800-1900 cm-~, and the ordering of the excitations may change. Likewise, if SCH3- is replaced by SH-, the excitation energies decrease by up to 5 200 r -l. On the other hand, substituting SH2 for S(CH3)2 increases all excitations by up to 6 800 cm-l. Consequently, the results obtained with Cu(Im)2-
25 (SH)(SH2) + and Cu(Im)2(SCH3)(S(CH3)2) + are quite similar, and can be improved by a set of corrections factors [33,34,36]. Further replacing SCH3- with SCzHshas a small effect on all excitation energies (less than 200 crn-~). This is a bit surprising, since Zerner et al. report changes of up to 2 100 crn-~ in the spectrum of rubredoxin when the chromophore is modelled by Fe(SCzHs)4 instead of Fe(SCH3)4 [89]. Second, the geometry strongly influences the spectrum. In particular, the Cu-S distances are crucial. If the Cu-Scys distance is decreased by 5 pm, all excitation energies increase by up to 2 000 cm-~. Similarly, if the Cu-SMet bond length is increased by 10 pm, the excitation energies increase by up 900 cm-1, except for the excitation to the e 2/~ state (the charge-transfer to methionine), which change by 1 900 crn-~ [36]. Therefore, in order to reproduce experimental excitation energies and to get accurate results it is necessary to reoptimise the two Cu-S distances with the CASPT2 method [34]. Third, the effect of the surrounding protein and solvent molecules, which has been estimated using a point-charge model, is appreciable and cannot be neglected. The general trends are the same for all proteins studied, and can be related to the character of the transitions [33,34,36,39]. The excitation energies of the two Scys----)Cucharge-transfer states increase by up to 2 800 crn-~, whereas the ligand-field excitations, which involve an appreciable charge-flow from Cu to Scys, decrease by almost 2 000 cm-~. A considerably smaller effect is found for the lowest transition, which is essentially a transition within the Cu-Scys bond. However, if only details in the crystal structure are changed, e.g. the binding or exchange of the coenzyme (NADH) in Cu-substituted alcohol dehydrogenase, the variation in the spectra is limited, less than 300 c m -1 [39]. It is also notable that the surroundings reduce all oscillator strengths by a factor of up to 1.75. Finally, we have also investigated the influence of the basis sets, relativistic effects, and Cu 3s and 3p semicore correlation on the spectrum [36]. Somewhat unexpectedly, the spectrum is quite insensitive to the basis set. Increasing it with double polarising functions on Cu and S and single polarising functions on C, N, and H, change the spectrum by less than 250 and 500 crn-~ for the ligand-field and charge-transfer states, respectively, except for the charge-transfer state from methionine, which is changed by 2 300 crn-~ [36]. Likewise, relativistic effects and the Cu 3s and 3p correlation do not influence the spectrum very much, less than 800 cm -~ [36]. However, the two effects act in the same direction and change the ligand-field and the charge transfer excitations in opposite directions (both effects favour states with a low Cu 3d population). Thus, their combined effect may significantly alter the relative energy of the excited states. Therefore, they are included in all reported excitation energies.
26
5. REORGANISATION ENERGIES
According to the semiclassical Marcus theory [6], the rate of electron transfer depends on the reduction potential (AGo), the electronic coupling matrix element (HDA), and the reorganisation energy (A,):
ker
=
2zr H2 a exp( h ~4n:XR'---~
- (AGo + ~)2
).
(1)
42RT
If the geometry of the active site and its surroundings does not change much during electron transfer, the reorganisation energy will be small, and the reaction will be fast. Therefore, it is of vital importance for an electron-transfer protein to reduce the reorganisation energy. For convenience, the reorganisation energy is usually divided into two parts: inner-sphere reorganisation energy, which is associated with the structural change of the first coordination sphere, and outer-sphere reorganisation energy which involves structural changes of the remaining protein as well as the solvent. Several groups have tried to estimate the reorganisation energy for transitionmetal complexes and proteins using theoretical methods of variable sophistication and with varying success [51,90-101]. However, we seem to be the only group that has systematically studied models with relevance to the blue copper proteins. We have estimated inner-sphere reorganisation energies by calculating the energy difference of a reduced model between the optimum geometry of the reduced and oxidised complex or vice versa [68]. For our best model of plastocyanin, Cu(Im)2(SCH3)(S(CH3)2) § we obtain an inner-sphere reorganisation energy of 62 kJ/mole, whereas models of the rhombic type 1 proteins nitrite reductase and stellacyanin have slightly larger values, 78 and 90 kJ/mole. It is far from trivial to compare these values with experimental data. First, we need an estimate of the outer-sphere reorganisation energy. However, it depends strongly on the geometry of the docking complex of the donor and acceptor proteins in the electron-transfer reaction of interest, and it is unlikely that it should be additive for different reactions. Therefore, it is highly questionable to use Marcus' combination rules [6] to obtain reorganisation energies for reactions that have not been studied experimentally [102-107]. It should also be noted that the other terms in the Marcus' equation, the reduction potential and the coupling constant, also change when the docking complex is formed [ 108]. Therefore, reliable comparisons can only be done when calculations and experiments are performed on the same electron-transfer reaction. However, to get a crude feeling about the relation between the calculated and measured reorganisation energies, we can proceed in the following way. The outer-sphere reorganisation energy of three tentative configurations of the dock-
27 ing complex between plastocyanin and its natural electron donor, cytochrome f, has been estimated by force-field methods and numerical solution of the PoissonBoltzmann equation [99]. The best estimate is 42 U/mole, and it can be combined with our calculated inner-sphere reorganisation energy (inner-sphere reorganisation energies can to a good approximation be expected to be additive, since they do not depend on the conformation of the docking complex) for plastocyanin to get an approximate total self-exchange reorganisation energy of 100 U/mole. This energy is slightly lower than the experimentally measured reorganisation energy for plastocyanin (120 kJ/mole) [102]. The reorganisation energy of azurin, which is the best studied blue copper protein [103-107], is slightly lower (about 80 kJ/mole), but it is likely that azurin, with its bipyramidal copper site, has a lower reorganisation energy than the pyramidal site in plastocyanin [68]. Recently, Loppnow and Fraga [109] have estimated the reorganisation energy for plastocyanin by analysing resonance Raman intensities. They obtain an innersphere reorganisation energy of 18 U/mole, which is significantly lower than our [ 105]. However, it represents the reorganisation energy of charge transfer during the excitation to the intense blue fine. As we saw above, only about 0.2 e is transferred during this excitation (and only from Scys to Cu) and it has therefore tittle to do with the reorganisation energy during electron transfer of plastocyanin
[68]. We have also investigated how the blue copper proteins have achieved a low reorganisation energy. As can be seen in Figure 9, a six-coordinate Cu(H20)6 +/2+ complex has a rather small reorganisation energy, 112 kJ/mole. However, Cu(I) cannot stabilise such a high coordination number. If it is allowed to relax to its preferred coordination number, the reorganisation energy of Cu(H20)6 + increases strongly, to 336 U/mole. If the number of ligands is lowered to four, we get a rather high reorganisation energy, 186-247 kJ/mole for Cu(H20)4 +/z+, depending on whether the reduced complex is allowed to relax to a lower coordination number or not. Thus, the low coordination number of the copper ion in the proteins is unfavourable for the reorganisation energy, but necessary since Cu(I) normally does not bind more than four ligands. Instead, the low reorganisation energy of the blue copper proteins is achieved by a proper choice of ligands. Nitrogen ligands give an appreciably (50 U/mole) lower reorganisation energy than water, owing to the lower Cu-N force constant. A methionine ligand gives an even lower reorganisation energy (by 14 kJ/mole), because of its weaker Cu-S bond. The cysteine ligand decreases the reorganisation energy even more, by 45 U/mole, although the Cu-Scys force constant is appreciably higher than the one of Cu-N. This decrease is caused by the transfer of charge from the negative charged thiolate group to Cu(II), which makes the oxidised and reduced structures quite similar. The effects of the methionine and cysteine ligands are approximately additive, so the Cu(NH3)3(SH)(SH2) +/~ complex has a reorganisation energy of 74 U/mole. Finally, for the trigonal
28
3507Cu(H20)6 "~
250
~
Cu(n20)4
001 ~~Cu(
~ 1505o0
CNtt~()NH3)3S(I~ )3SH
II
Figure 9. The inner-sphere self-exchange reorganisation energy of a number of complexes related to the blue copper proteins. The hatched bars indicate the reorganisation energy obtained when the reduced structure preferred a lower coordination number than the oxidised structure [681. Cu(NH3)3(SH)(SH2) +/0 complex (all the other complexes have been tetragonal),
the oxidised structure is even closer to the reduced one, so the reorganisation energy is only 66 kJ/mole. If more realistic models are used, the reorganisation energy decreases by 4 kJ/mole and we arrive at the estimate discussed above. Thus, we can conclude that the inner-sphere reorganisation energy of our blue copper models is similar to the one in the proteins. This indicates that the proteins do not alter the reorganisation energy to any significant degree, i.e. that protein strain is not important for the low reorganisation energies of the blue copper proteins. On the contrary, an important mechanism used by the blue copper site to reduce the reorganisation energy is the flexible bond to the methionine ligand, which can change its geometry at virtually no cost [54,68]. This mechanism is actually the antithesis of the strain hypotheses, which suggest that a low reorganisation energy is obtained by the rigid protein obstructing any change in geometry.
6. R E D U C T I O N P O T E N T I A L S The reduction potential is central for the function of electron-transfer proteins, since it determines the driving force of the reaction. In particular, it must be poised between the reduction potentials of the donor and acceptor species. Therefore, electron-transfer proteins normally have to modulate the reduction potential of the redox-active group. This is very evident for the blue copper proteins, which show reduction potentials ranging from 184 mV for stellacyanin to -1000 mV for the type 1 copper site in domain 2 of ceruloplasmin [ 1,110,111 ].
29 These two copper sites are untypical in that stellacyanin has a glutamine amide oxygen atom as the axial ligand (instead of methionine), whereas the ceruloplasmin centre does not have any axial ligand at all (leucine replaces the methionine ligand). However, blue copper proteins with the typical CuHis2CysMet ligand sphere have reduction potentials between 260 and 680 mV (e.g. pseudoazurin and rusticyanin) [1,63], although they share the same active site. It is also clear that the reduction potentials of the blue copper proteins are high, higher than for most other electron-transfer proteins (-700 to +400 mV for iron-sulphur clusters [112] and-300 to +470 for cytochromes [113,114]), and also higher than for a copper ion in aqueous solution (+ 150 mV [115]). The reason of these high potentials and their great variation has been much discussed. Originally, the entatic state and the induced rack hypotheses suggested that the high potential was caused by protein strain. They proposed that the protein forces Cu(II) to bind in a geometry more similar to that preferred by Cu(I). Thus, Cu(II) should be destabilised, which would increase the reduction potential [10,12]. This effect has been observed for inorganic complexes with strained ligands [ 115]. However recently, Malmstrtim and Gray showed that the reduction potential of denatured azurin is higher than for the native protein [105,116,117]. This shows that the reduced copper site gains more from unfolding than the oxidised site, especially as unfolding would increase the solvent accessibility of the site, thereby favouring Cu(II) and lowering the reduction potential. Consequently, the overall effect of the folding of the protein is a lowering of the reduction potential [117], i.e. opposite to what the strain hypotheses originally suggested. This is in line with the suggestion by Solomon and co-workers that it is only the Cu(I)-SMe t bond that is constrained by the protein [13]. A normal Cu(I)-SMet bond length is about 230 pm, whereas in the blue copper proteins, the observed length is around 290 p ~ Such an elongation can be predicted to reduce significantly the donation of charge from the ligand to the copper ion, which would increase the reduction potential. In fact, density functional Xct calculations indicate that the reduction potential would increase by more than 1000 mV by this elongation [ 13]. Malmstr6m et al. have extended this hypothesis to include also other axial ligands [117,118]. They point out that stellacyanin has the strongest axial ligation among the blue copper proteins (a glutamine amide group at a distance of--220 pm) and also the lowest reduction potential. Azurin has two axial ligands at distances around 310 pm and a higher reduction potential (285-310 mV). In plastocyanin, the Cu-O distance has increased to about 390 pm, as has the reduction potential (to 380 mV). In rusticyanin, the Cu-O distance is even greater, 590 pm, and the carbonyl oxygen does no longer point towards the copper site. This is correlated with a high reduction potential of 680 mV (however, they disregard the compensating shortening of the Cu-SMet bond in plastocyanin and rusticyanin).
30
0 - - - 0 Cu(I) solution Cu(lI) solution Cu(l) vacuum A-----A Cu(II) vacuum
"• 6 ~4 2
0
230
250 2 70 290 C u - S ( M e t ) di sta n c e ( p m )
310
Figure 10. The calculated potential energy surface of the Cu-SMet bond in the Cu(Im)2(SCH3)(5(CH3)2)+/~complexes [35]. Two curves are given for each oxidation state, one in vacuum and one in water (calculated with the CPCM method). The actual potential in any protein can be expected to found in between these two extreme cases. Reduction potentials can be found by forming the difference between the curves of the oxidised and reduced complex together with a hypothesis whether the Cu-SMct bond is constrained in the oxidised, reduced, or both states [68]. Note that 1 kJ/mole = 10.4 mV. Finally, in fungal laccase and ceruloplasmin, which have the highest known reduction potentials (750-1000 mV), a leucine replaces the methionine ligand, yielding a three-coordinate copper site. Thus, they propose that the protein fold dictates the reduction potential of the copper site by varying the strength of the axial ligation [ 117,118]. We have examined these suggestions by several types of calculations. First, we have used free energy perturbations to estimate the maximum strain energy plastocyanin or nitrite reductase can mobilise to resist a certain copper geometry [54]. These calculations show that the proteins are quite indifferent to the CU-SMe t bond length. It costs less than 5 kJ/mole to change the length of this bond between the values observed in different crystal structures or in optimised vacuum models. This energy is at least a factor of two too low to explain the observed differences in the CU--SM~t bond length. Instead, the difference between the calculated and observed CU-SM,t bond seems to be caused by systematic errors in the theoretical method, dynamic effects, and solvation effects [26,35,54], as was discussed above. Second, quantum chemical calculations of the potential energy surface of the CU-SMe t bond shows that it costs less than 10 kJ/mole to change the Cu-SMet bond length by 100 pm around its optimum value (see Figure 10), a range larger than the natural variation in this bond [ 14,54]. Thus, even if the proteins could constrain this bond, it would affect the electronic part of the reduction potential by less than 10 kJ/mole, or 100 mV, i.e. much less than the variation found among the blue copper proteins. Moreover, a constrained Cu(I)-SMetbond would
31
destabilise the reduced state and therefore decrease the reduction potential, contrary to the suggestion of a raised potential [ 119] and the fact that the blue copper proteins are characterised by high reduction potentials. However, there are other contributions to the reduction potential than the electronic part, most prominently the solvation energy of the active site caused by the surrounding protein and solvent. We have therefore studied the reduction potential of the blue copper proteins using various methods to include the solvation effects. The results have shown that constraints in the Cu-Suet bond length can affect the reduction potential by less than 70 mV (c.f. Figure 10) [35]. Furthermore, we have tested the suggestion [63,118] that the reduction potential is determined by the axial backbone carbonyl ligand or by replacements of the methionine ligand (by glutamine in stellacyanin or leucine in ceruloplasmin and laccase). Again, our results show that the potential energy surfaces of the axial ligands are too soft to account for the variation in reduction potential among the blue copper proteins, even if solvation effects are taken into account (the total effect is less than 140 mV) [35]. This is in accordance with mutation studies of the axial methionine ligand in azurin [120], showing that most substitutions give only modest changes (less than 60 mV). The largest effects are found for mutations to hydrophobic residues, which increase the reduction potential by up to 140 mV, and also mutations that change the structure of the copper site [ 121]. Therefore, there must be other reasons for the high potentials of the blue copper proteins. Examination of small inorganic models [63,115,122] have shown that anionic ligands lower the potential, whereas sulphur and nitrogen re acceptor ligands raise the potential. Our calculations of the reduction potentials of a number of blue-copper models confirm this [45]. The replacement of an ammonia ligand in Cu(NH3)4 +/2+ by SH2 increases the potential by 0.7 V, whereas SH- decreases the potential by 0.3-0.5 V. ff both models are included in the complex, Cu(NH3)z(SH)(SH2) ~ the potential hardly change relative to the Cu(NH3)4 +/2+ complex. The same is true if more realistic ligands are used (Cu(Im)z(SCH3)(S(CH3)2)~ A tetragonal model of the rhombic blue copper proteins has a slightly larger reduction potential than the trigonal model (0.07 V), but it is not clear if this difference is significant. Moreover, other effects are as important as the ligands. The dielectric properties of the protein matrix are very different from those of water. It has often been argued that it behaves as a medium with a low dielectric constant (around 4 compared to 80 in water) [47,123,124]. Figure 11 shows that this gives rise to a very prominent change in the reduction potential of a blue-copper site [45]. It increases by 0.8-1 V as the site is moved from water solution to the centre of a protein with a radius of 1.5 nm (like plastocyanin) or 3.0 nm (like an azurin tetramer). It can also be seen that it is not necessary to move the site to the centre of the protein to get a full effect. Already at the surface of the protein, 80% of the maximum effect is seen, and when the site is 0.5 nm from the surface (as is typi-
32
1000
8o0
60O
4oo o tv 2O0
0
0
10
20 30 Distance (A)
40
50
Figure 11. The reduction potential of the Cu(Im)2(SCH3)(S(CH3)2) ~ complex as a function of the size of the protein (1.5 or 3.0 nm radius) and the distance between the copper ion and the centre of the protein [45]. The protein was modelled by a sphere of a low dielectric constant (4) surrounded by water (e = 78.39), and the copper site as a collection of point-charges taken from quantum chemical calculations. The potentials were calculated with the MEAD program.
cal for the blue copper proteins), the change in the reduction potential is 90% of the maximum. Thus, reduction potential of the blue-copper site in the protein will be 0.64).9 V higher than in water solution, in accordance with a 0.5-V variation in the cytochrome reduction potentials depending on the solvent exposure of the haem group [125]. This effect alone explains the high reduction potential of the blue copper proteins compared to copper in aqueous solution. Naturally, details of the protein matrix, i.e. the presence and direction of protein dipoles and charged groups around the copper site, also have strong influence on the reduction potential [53,126]. In fact, a single water molecule 0.45 nm from the copper ion may change the potential by 0.2 V, and backbone amide groups may have similar effects [53]. The water accessibility and the packing of hydrophobic residues have also been shown to significantly influence the reduction potential. In fact, it has been suggested that the protein may modify the reduction potential by more than 1 V without any changes in the redox-active group [52]. With these results in mind, the large variation of the reduction potentials of blue copper proteins is not surprising, even if the detailed mechanism remains to be revealed for most proteins [45,53,126].
7. R E L A T E D PROTEINS 7.1 The binuclear CUA site Cytochrome c oxidase is the terminal oxidase in both prokaryotic and eukaryotic cells and is responsible for the generation of cellular energy via oxidative phosphorylation [127]. It couples the catalytic four-electron reduction of 02 to
33 water to transmembrane proton pumping, which can be used for ATP synthesis and long-range electron transfer. The active site is a haem a3-CUB binuclear site, whereas a second haem a and an additional copper site, CUA, serve as electrontransfer intermediates between cytochrome c and the active site. The CUA site shows many similarities with the blue copper proteins. Recently, the structure of cytochrome c oxidase was determined by crystallography [128-130]. This solved an old controversy regarding the geometry of the CUA site [131,132], showing that it is a binuclear site, bridged by two cysteine thiolate groups. Each copper ion is also bound to a histidine group and a weaker axial ligand, a methionine sulphur atom for one copper and a backbone carbonyl group for the other. The Cu-Cu distance is very short, --245 pm [133], and it has been speculated that it represents a covalent bond [ 134-136]. A similar site is found in nitrous oxide reductase, a terminal oxidase that converts N20 to N2 in denitrifying bacteria [137]. During electron transfer, the CUA site alternates between the fully reduced and the mixed-valence (CuI+Cun) forms. Interestingly, the unpaired electron in the mixed-valence form seems to be delocalised between the two copper ions. Several theoretical investigations of the electronic sm~ctm'e and spectrum of the CUA dimer have been published [138-144]. In similarity to the blue copper proteins, it has been suggested that the structure and the properties of the CUA site is determined by protein strain. More precisely, it has been proposed [136] that CUA in its natm'al state is similar to an inorganic model studied by Tolman and coworkers [145]. This complex has a long Cu-Cu bond (293 pm) and short axial interactions (-212 pm). The protein is said to enforce weaker axial interactions, which is compensated by shorter bonds to the other ligands and the formation of a Cu-Cu bond. This should allow the protein to modulate the reduction potential of the site [136,146]. We have studied the structure, reorganisation energy, and reduction potential of the CUA site with the same theoretical methods as for the blue copper proteins [147]. The experimentally most studied state of CUA is the mixed-valence state. Our optimised structure of (S(CH3)2)(Im)Cu(SCH3)2Cu(Im)(CH3CONHCH 3)+ is very similar to available experimental data [128-130,133,148-151] (c.f. Figure 12 and Table 6). The Cu-Cu distance is 248 pm, 2-5 pm longer than what is obtained by extended x-ray absorption fine structure measurements (EXAFS), and the Cu-Scys distances are 231-235 pm (~2 pm longer than the EXAFS resuits). Even the distances to the axial ligands are within the experimentally observed range: 245 pm for the methionine ligand and 220 pm for the backbone carbonyl group. The difference in the Cu-NHis distances seems to be slightly larger, 6-7 pm, which is probably due to hydrogen-bond interactions in the protein [147]. It has been noted that some inorganic models of the CUA site have an appreciably longer Cu-Cu distance (-290 pm) [145]. This is accompanied by a change in the electronic state: In the proteins there is a ~* antibonding interaction between
34
Figure 12. The optimised geometry of the 6-bonded structure for the (S(CH3)2)(Im)Cu(SCH3)2Cu(Im)(CH3CONHCH3) + complex [ 147] compared to the crystal structure of the CUA site in cytochrome c oxidase (shaded and without any hydrogen atoms) [ 149]. the copper ions in the singly occupied orbital (an orbital of B3u symmetry for an idealised O2h Cu2S2 core), whereas in the model, there is instead a Cu-Cu rc bonding interaction (B2u symmetry) [136,138,141,143]. We have also optimised the rt bonded electronic state. It is characterised by a Cu-Cu bond length of 310 pm and a slightly larger variation in the Cu-Scys distances (226-236 pm), whereas the other geometric parameters are similar to those of the t~ bonded structure. In particular, there is no significant difference in the bond lengths to the axial ligands. Therefore, it is unlikely that variations in the axial interactions (caused by the protein) may change the electronic state of CUA. Interestingly, the two structures are almost degenerate; they have the same energy within 2 kJ/mole. In fact, the full potential surface for the Cu-Cu interaction is extremely fiat. As can be seen in Figure 13, the barrier between the two electronic states is less than 5 kJ/mole and the Cu-Cu distance can vary over 100 pm (240-340 pm) at a cost of less than 5 kJ/mole both in vacuum and in water solution. Thus, there is no indication that the CUA site should be significantly strained. The difference between the protein structures and the mixed-valence model is caused by the degeneration of the two electronic states (indicating that small differences in the surrounding protein may stabilise either structure) and the fact that the inorganic complex involve poor models of the histidine and axial ligands (four amine groups at almost the same distances, 211-212 pm). This illustrates the danger of relying on inorganic complexes with poor ligand models; if such models had been used in theoretical calculations, nobody had believed in them. The two electronic states differ in the localisation of the unpaired electron: in the ~* state, the electron is delocalised over the whole system, whereas in the rt state, the electron is more localised to one copper. Our calculations reproduce this movement of the electron: in the system with a long Cu-Cu bond, the elec-
35
H Fully reducedVacuum 0"------~Mixed-valence Vacuum Fully reducedSolvent O----~ M
/ / /
~6 ~4
230
250
270
290
310
330
C u - C u distance (pm)
Figure 13. The calculated potential energy surface for the Cu-Cu interaction of the (S(CH3)2)(Im)Cu(SCH3)2Cu(Im)(CH3CONHCH3) § complex in vacuum and in water [147].
tron is mainly localised to the copper ion with the methionine ligand. However, the electronic structure is quite flexible, as experiments with engineered CUA sites have shown [146,153,154]. The optimum structure of the fully reduced state of our CUA model is also shown in Table 6. It can be seen that most Cu-ligand bond lengths increase by 17 pm upon reduction, but the Cu-Cu distance increases by 9 pm and the Cu-O distance by as much as 30 pn'L No crystal structure has been published for this oxidation state, but EAXFS data are available [133]. It can be seen in Table 6 that our optimised structure is quite close to these results, with the same trends as for the mixed valence structure (i.e. slightly too long Cu--Cu and Cu-N bonds). Therefore, our calculations excellently reproduce the changes upon reduction observed by EXAFS, e.g. the change in the Cu-Cu distance. We also reproduce the larger variation in the Cu-S distances (233-247 pm). Consequently, the calculated reorganisation energies can be expected to be quite accurate. For the reduction of the o* state, we predict a serf-exchange inner-sphere reorganisation energy of 43 kJ/mole [ 147]. This is 20 kJ/mole lower than for plastocyanin [68]. It has been speculated that the reorganisation energy of CUA should be half as large as for a blue-copper site due to the delocalised electron [ 136,144, 155,156] and older estimates of the reorganisation energy of the CUA were in general quite low, 15-50 U/mole [157,158]. However, recent experiments have indicated that the reorganisation energy is of the same size as for the blue copper proteins, around 80 kJ/mole [159]. If the outer-sphere reorganisation energy of cytochrome c oxidase is of the same magnitude as for plastocyanin (-40 kJ/mole [99], our calculated reorganisation energy is in good agreement with the latter experiment. The reorganisation energy for reduction of the n state is appreciably higher, 69 kJ/mole, which is due to the change in the Cu-Cu bond length and the angles in the CuS2Cu core [147].
36 Table 6. Bond distances in four electronic states of the (S(CH3)2)(Im)Cu(SCH3)2Cu(Im)(CH3CONHCH3) model [147] compared to experimental data for CUAand model compounds. Oxidation Electronic Distances (pm) states State Cu-Cu C u - S c y s Cu-NHis Cu-SMet CH-O I+I 257 233-247 207-211 240 250 EXAFSa 251-252 231-238 195-197 II+I G* 248 231-235 202-209 245 220 rc 310 227-236 203-210 242 219 EXAFS" 243-246 229-233 195-203 crystalb 220-258 207-244 177-211 239-302 219-300 modelc 293 225-229 211 212 212 II+II 342 228-234 202-203 242 202 modeld 334 233 206 210-226 210-226 a EXAFS data [ 133,151]. b Protein crystal structures [128-130,148-150]. CA mixed-valence inorganic model synthesised by Tolman and coworkers [145] with a r~ ground state. Note that both the histidine and axial ligands are amine groups in the model. d Another fully oxidised inorganic model synthesised by Tolman and coworkers [ 152]. Note that each copper ion is five-coordinate with three amine nitrogen ligands. The potential energy surface for the Cu-Cu bond in the reduced CUA model is almost as flat as in the mixed-valence state (Figure 13). Therefore, the reduction potential of the CUA site cannot change by more than 100 mV by constraints in this bond. In particular, a change in the electronic structure of the mixed-valence state from rt to G does not change the reduction potential by more than 13 mV. Solvation effects alter the results by less than 20 mV (Figure 13). Similarly, the potential energy surfaces of the CU-SM~t and C u - O bonds are also flat (Figure 14). The two bond lengths can vary over a range of almost 100 pm at an energy cost of less than 8 kJ/mole. As for the blue copper proteins, the optimum distance for the carbonyl group is shorter than for the methionine ligand. Thus, it unlikely that the axial ligands determine the reduction potential of the Cug site [136,146,155]. Even if the protein could constrain these distances, the results in Figure 14 show that the reduction potential would vary by less than 80 mV for the experimentally observed range of these bond lengths. Inclusion of solvation effects does not change the situation significantly [ 147]. Finally, we have also studied the fully oxidised CUA model (Cun+CuU). This state has not been unambiguously observed in biology yet, but it has been suggested that it is responsible for the differing characteristics of the Cuz site in nitrous oxide reductase [146,160]. Our calculations indicate that the fully oxidised state should have a much longer Cu-Cu bond (342 pm) and a shorter C u - O bond (202 pm) than the mixed-valence state. This is reasonably similar to a fully oxidised inorganic model complex with bridging thiolate groups and three amine nitrogen ligands of each copper, see Table 6 [152]. In particular, the angles in the CuS2Cu core are very similar, 85 ~ compared to 83 ~ for the S--Cu-S angle. This is quite different from the angles in the mixed-valence G* state (115~ Therefore,
37
6
~4 O-----O C'u-O Reduced --- Cu-O Ox Cu-S Reduced
kZ///
210
230
-
. - -co-so~,~d
250 270 Distance (A)
290
310
Figure 14. The calculated potential energy surfaces for the Cu-SMet and Cu-O interactions of the (S(CH3)2)(Im)Cu(SCH3)2Cu(Im)(CH3CONHCH3)§ complex in vacuum [ 147]. the self-exchange reorganisation energy for the oxidation of this state is high, 133 kJ/mole [ 147]. Interestingly, the re-bonded structure is more similar to the fully oxidised state, and the corresponding reorganisation energy is also appreciably lower, 90 kJ/mole. It has been suggested that the CUA and Cuz sites in nitrous oxide reductase in fact are the same site, with altered properties as a result of a comformational change [146,160]. If this suggestion is correct, it follows from our resuits that the conformational change may stabilise the t~* structure in the CUA site, but the ~ structure in the Cuz site. This would hardly cost anything in energy terms (Figure 13), but it would strongly reduce the inner-sphere reorganisation energy for oxidation of the mixed-valence state [ 147]. In conclusion, the properties of the CUA dimer are very similar to those of the blue copper proteins. Each copper ion has a trigonal structure with a weakly bound axial ligand. There are two nearly degenerate electronic states, which together with the fiat potential of the axial ligands give a very plastic site. The inner-sphere reorganisation energy is slightly lower than for the blue copper proteins and it is achieved by the same mechanisms: delocalisation of the charge between the copper and sulphur ions and flexible bonds to the axial ligands. As for the blue copper proteins we have not seen any evidence for protein strain in the CUA site.
7.2 Cytochromes In nature there are only two major types of electron-cartier sites in addition to the blue copper proteins and the CUA site, viz. cytochromes and iron-sulphur clusters [ 161,162]. The cytochromes consist of an iron ion bound to a porphyrin ring. Two axial ligands complete the octahedral coordination sphere. During electron transfer, iron alternates between Fe(II) and Fe(III). Several types of cytochromes exist in biological systems, depending on the substituents on the por-
38 phyrin ring, the axial ligands, and the number and arrangement of the haem groups in the protein (cytochromes a, b, c, f, etc) [163]. Their reduction potential ranges between -300 and +470 mV [ 113,114]. Several groups have tried to predict the reduction potentials of various cytochromes using theoretical approaches [51,95,97,114,164-171]. A few groups have also studied the reorganisation energy of these proteins. For example, the outer-sphere reorganisation energy of cytochrome c has been calculated to 28100 kJ/mole with various theoretical methods [6,51,93,95,97,100]. The majority of this energy comes from the protein, 70-90 %. It is also clear that the protein reduce the reorganisation energy compared a haem group in water solution, which has been estimated to I(Kt-160 kJ/mole [95,97,100]. The inner-sphere reorganisation energy of the cytochromes is considered to be low, ranging from negligible to 48 kJ/mole [6,51,93,95,97]. It is normally estimated from the difference between the measured total serf-exchange reorganisation energy and the calculated outer-sphere reorganisation energy. Considering the large variation of the latter, and an almost equally large span of experimental estimates (e.g. 70-140 kJ/mole for the serf-exchange reorganisation energy of cytochrome c [172-174]), such estimates much be considered very approximate. Alternatively, the inner-sphere reorganisation energy has been estimated from vibrational frequencies and the observed changes in the haem geometry in crystal structures. However, also these estimates are approximate, since the observed changes in the bond lengths to the iron ion are smaller than the uncertainty in the crystal structures. Therefore, we have investigated the inner-sphere reorganisation energy of iron porphine (the porphyrin ring without any substituents) with different axial ligands [24]. The results presented in Table 7 show that if the axial ligands are uncharged, the reorganisation energy is small, 5-9 kJ/mole, appreciably smaller than for the blue copper proteins (62 kJ/mole). It varies somewhat with the axial ligands. Two methionine ligands (as in bacterioferritin [175]) give the lowest reorganisation energy, whereas the most common sets of ligands (His-His and HisMet, as in the b and c type cytochromes [163]) give slightly higher reorganisation energies. We have also tested a number of charged axial ligands, which have been suggested to be present in haem proteins [113]. These models have appreciably higher reorganisation energies, ranging from 20 kJ/mole (His-Cys) to 47 kJ/mole (His-Tyr). Interestingly, the only combination that has been unambiguously observed in a cytochrome is His-Tyr (in the dl domain of cytochrome Cdl nitrite reductase [ 176]). At a first glance the results may indicate that the Tyr ligand in this cytochrome should be protonated. Yet, the reorganisation energy is not larger than observed for blue copper proteins or iron-sulphur clusters [24,68,147], so it cannot be excluded that the Tyr ligand is deprotonated. All other characterised proteins with negatively charged axial ligands are enzymes with a catalytic func-
39 Table 7. Geometries and inner-sphere reorganisation energies for a number of cytochrome models calculated by the B3LYP method [24]. The haem group was modelled by Fe(porphine) and Met, His, Amt (amino terminal), Cys, Tyr, and Glu were modelled by S(CH3)2, Ira, CH3NH2, SCH3-, C5H60-, and CH3COO-, respectively. All complexes were assumed to be in the low-spin state in accordance with experiments [ 162]. Axial ligands Oxidation Reorg. energy Distance to Fe (pm) 1 2 state ( M / m o l e ) Ligand 1 Ligand 2 N Met Met II 2.7 240 240 202 III 2.1 240 240 202 His Met II 4.2 203 243 202 III 4.1 200 244 201 HIS HIS II 3.7 205 205 202 III 4.5 202 203 201 His Amt II 4.2 203 208 202-203 III 4.4 200 205 201-202 HIS Cys II 9.7 211 238 202 III 10.3 215 222 201-203 His Tyr II 21.2 206 199 202-203 III 25.8 207 184 201-203 His Glu II 13.0 205 199 202-203 III 13.4 207 187 201-202 don rather than electron carders (often in combination with a five-coordinate iron ion). Much experimental data are available for the structure of small inorganic haem models with various axial ligands [177,178]. From these, it can be conclude that our calculated Fe-Npor, Fe-NHis, Fe-SMet, and Fe-Scy~ distances are slightly too long, by 2-3, 4-5, 6, and 3 pm, respectively [24]. This probably reflects the accuracy of the B3LYP method. However, it is also clear that the discrepancy is the same (within 1 pm) for the two oxidation states. Therefore, the change in the Fe-ligand bond lengths upon reduction is accurately reproduced in our models, so calculated reorganisation energies can be expected to be quite reliable. It is also notable that the accuracy of our optimised models is better than what can be expected for a metal site in protein crystallography [179]. Therefore, it is not meaningful to calibrate our results by comparing to a single protein structure. We have also investigated how the cytochromes have achieved a low reorganisation energy [24], using similar methods as for the blue copper proteins [68]. First, an octahedral geometry is favourable for electron transfer, since there is no change in the angles upon reduction. Second, cytochromes en~loy nitrogen and sulphur ligands, which form weaker bonds with smaller force constants than oxygen ligands (the reorganisation energy of Fe(NH3)6is a third of that of Fe(H20)6). Third, covalent strain in the porphyrin ring decreases the changes in the Fe-Npo~ distances. For example, if the porphyrin ring is replaced by two molecules of diformamidate (NHCHNH-), a small ligand that often has been used as a reasonable minimal model for the porphyrin ring [180], the equatorial Fe-Npor distances
40
change by 9-10 pm upon reduction, compared to ~1 pm with the full porphine model. This increases the reorganisation energy by 44 kJ/mole. It is informative to compare the haem group and the blue copper proteins since we have argued strongly against a reduction of the reorganisation energy by strain in the blue copper proteins [10,12,14,49,63]. The major difference is that the porphyrin ring is held together by strong covalent bonds and is constrained by the aromaticity of the ring, whereas in the protein, the ligands are oriented by weak torsional constraints and non-bonded interactions. Covalent bonds are stronger than metal-ligand bonds, whereas torsions and non-bonded interactions are weaker. Therefore, the iron ion is constrained in the haem group, whereas it is more likely that the protein will distort if the preferences of the metal and the protein differ. Moreover, it must be recognised that if significant strain were involved in the binding of a metal, it would simply not bind; a strain energy of 70 kJ/mole, as has been suggested for the blue copper proteins [ 12], corresponds to an equilibrium constant of 1.5-1012.
7.3 Iron-sulphur clusters Iron-sulphur clusters are the third type of the widely available electron-transfer sites in biology. They consist of iron ions surrounded by four sulphur ions, either thiolate groups from cysteine residues or inorganic sulphide ions. Regular clusters with one (rubredoxins), two, three, or four (ferredoxins) iron ions are known, as well as a number of more irregular clusters, also with other ligands than cysteine [ 112,181 ]. Theft reduction potentials vary between -700 and +400 mV [ 112]. The electronic structure, spectroscopy, and reduction potential have been thoroughly studied for all common classes of iron-sulphur clusters [52,89,182-191]. In particular, Noodleman and coworkers have performed detailed quantum chemical calculations on iron-sulphur clusters in various spin states [192-198]. It is now settled that rubredoxin contains an iron ion in the high-spin state (quintet for Fe n, sextet for Fern), whereas in the [2Fe-2S] clusters, the two iron ions are both in the high-spin state, but antiferromagntically coupled to form a singlet or doublet state for the oxidised 0II+l/I) and reduced (mixed-valence II+III) forms, respectively [112,162]. In variance to the CUA site, the unpaired spin is trapped at one of the iron ions in the mixed-valence state. However, nobody seems to have studied the reorganisation energy of the ironsulphur clusters systematically. Therefore, we have initiated an investigation of the inner-sphere reorganisation energy of Fe(SfH3)4, (SCH3)2FeS2Fe(SCH3)2,and (SCH3)2FeS2Fe(Im)2 [24]. The optimised structures and the calculated innersphere reorganisation energies are collected in Table 8. The Fe-S distances in the Fe(SCH3)4model increase from 232 to 242 pm when the site is reduced. This 10-pm increase is similar to what is observed in inorganic model complexes, but the average distances are shorter in the models, 227 and 236 pm, respectively [ 199]. Thus, the Fe-S distances are again 5-6 pm too
41 Table 8. Geometries and inner-sphere reorganisation energies for iron-sulphur models calculated by the B3LYP method [24]. Model Oxidation Reorg. Distance to Fe (pm) NHis state energy Scys Si Fe 21.4 Fe~ ~(SCH3)4 II 242 III 18.3 232 232-238 Models [ 199] II III 225-228 224-236 Proteins [202,203] II 223-233 III II+III 34.3 245-249 225-241 299 (SCH3)2FeSzFe(SCH3)4 III+III 41.1 235-237 226-227 285 III+III 230-231 219-223 270 Models [204] III+III 222-237 211-228 260-278 Proteins [205] (SCH3)2FeSzFe(Im)2 II+III 18.3 233-239 225-229 271 216-220 III+III 21.8 227-232 219-230 275 210-212 Proteins [206,207] II+III 222-231 223-235 271 21..3-223 , ,
long, but the change during oxidation is well reproduced. However, in the protein, the distances seem to be even shorter, 226 and 232 pm according to EXAFS experiments, and the change is smaller [ 112,200]. This is probably an effect of the protein environment, where several backbone amide groups form hydrogen bonds to the Scys atoms [76,201]. COMQUM calculations on rubredoxin show that the protein reduces the calculated average Fe-S distances to 230 and 236 pm for the oxidised and reduced site, respectively [24]. Thus, the hydrogen bond reduce the bond length more in the reduced than in the oxidised complex, giving an excellent agreement with experiments for the change in the bond length upon reduction. The calculated inner-sphere reorganisation energy of the Fe(SfH3)4 model in vacuum is 40 kJ/mole. In the proteins the hydrogen bonding reduce the reorganisation energy by -12 kJ/mole [24]. The inner-sphere reorganisation energy of rubredoxin has been estimated from the change in Fe-S bond lengths and the corresponding vibrational frequency [208]. The result is ~10 kJ/mole lower than our estimate, which illustrates that the reorganisation energy does not only arise from the changes in these bond lengths. We have also studied the (SCH3)2FeS2Fe(SCH3)2 complex in its fully oxidised and mixed-valence form as a model of the [2Fe-2S] ferredoxins. The optimised Fe-S distances are 5-10 pm longer than in experiments, again reflecting the systematic error of the B3LYP method. The discrepancy for the Fe-Fe distance is slightly larger, but this is probably an effect of a flexible Fe-Fe interaction, as for the Cu-Cu bond in the CUA site [147]. Our calculated reorganisation energy is 75 kJ/mole, appreciably larger than for the rubredoxin site. This is in accordance with a lower rate of electron transfer for these sites in proteins as well as in model systems [ 162,204]. At first, the increase in reorganisation energy for the dimeric iron-sulphur clusters (compared to the monomeric rubredoxin site) may seem a bit strange,
42 considering that for the dimeric CU A site, the reorganisation energy decreased compared to the blue-copper monomer. The reason for this behaviour is that the unpaired electron in the mixed-valence iron-sulphur site is localised to one of the iron ions, whereas it is delocalised in the CUA site. It has been suggested that a delocalised dimer should have approximately half the reorganisation energy of the monomer, because of a reduction in the change in the bond lengths upon reduction by a factor of two [ 144,155,156]. This was essentially what we observed for the CUA site [147]. In the iron-sulphur dimers, the change in the bond lengths upon reduction is not significantly altered. In fact, it is slightly increased around the iron ion that is reduced (13 pm compared to 9 pm for the rubredoxin model), but there are also appreciable changes around the other iron ion (6 pm on average). Even if the force constants are reduced around the reduced iron ion, the number of bonds is doubled. Therefore, the total reorganisation energy of the ferredoxin model increases. Interestingly, our model of the Rieske iron-sulphur site, (SCH3)2FeS2Fe(Im)2, has an appreciably lower reorganisation energy, 40 M/mole. This is due to smaller changes around the iron ions, 2-8 pm (c.f. Table 8) and lower force constants of the imidazole ligands. As for the cytochromes and blue copper proteins, we have also investigated how the iron-sulphur clusters have achieved a low inner-sphere reorganisation energy. First, iron is a better ion than copper, since the Fe(II) and Fe(llI) have similar preferences for the geometry and coordination number. Moreover, even at a fixed geometry, copper gives a higher reorganisadon energy than iron. For example, the reorganisation energy of an octahedral Cu(H20)6 complex is twice as large as for Fe(H20)6. This probably reflects the difference in the charge of the two ion pairs. Second, four ligands give slightly lower reorganisation energies than six, provided that the geometry does not change, since there are fewer bonds. Finally, iron-sulphur sites employ soft and large thiolate ligands, which give smaller reorganisation energies than harder ligands such as water.
8. PROTEIN STRAIN
The suggestion that proteins use mechanical strain for their function is an old but still viable hypothesis [ 12,10,209-211 ]. The most classical example of a protein for which strain has been suggested to play a functional role is probably lysozyme [212]. It was originally suggested that this protein forces its substrate to bind in an unfavourable conformation, viz. a conformation similar to the transition state. However, theoretical calculations by Levitt and Warshel convincingly showed that strain has a negligible influence on the rate of this enzyme; instead, the catalytic power is gained by favourable electrostatic interactions in the transition state [50]. This and other cases have led several leading biophysical chemists
43 to argue strongly against strain as an important factor in enzyme catalysis [50, 213,214]. To make strain hypotheses testable, it is vital to define what is meant by strain. Warshel has defined strain as distortions caused by covalent interactions (bond, angles, and dihedrals) and possibly also the repulsive part of the Van der Waals interaction [50]. This is close to the intuitive conception of mechanical strain, but it is hard to estimate except in classical simulations of proteins. We have used a wider definition of strain [49]: a change in geometry of a ligand (e.g. a metal coordination sphere) when bound to a protein (it includes effects that normally are not considered as mechanical strain, most prominently electrostatic and solvation effects). This change must be relative to a reference state. We have used the vacuum geometry as the strainless state, but other reference states are also conceivable, e.g. the ligand in water solution. However, such a choice is less welldefined. For exan~le, how large changes should be allowed in the reference state: May the number of ligands change? May a water molecule come in as an axial ligand, or as an equatorial ligand, or may it even replace the protein ligands? It must be recognised that any ligand necessarily acquires slightly different properties when bound to a protein. This is an effect of the trivial fact that a protein is different from vacuum or solution (it has another effective dielectric constant and presents specific electrostatic interactions). Such changes have been estimated for a number of protein-ligand complexes, and Liljefors et al. have argued that the energies involved are less than 13 kJ/mole if the reference state is the ligand in solution [215]. If the reference state instead is the ligand in vacuum, appreciably larger energies are observed. We have, for example, calculated energies associated with the change in geometry of the metal site when inserted from vacuum into a protein to 30-60 M/mole for the catalytic and structural zinc ions in alcohol dehydrogenase [75-77] and similar values for the blue copper proteins and iron-sulphur clusters, 16-51 M/mole [24,26]. We suppose that the strain hypotheses are intended to deal with systems where the strain is larger than normal and has a functional role. Therefore, we consider distortions smaller than this insignificant, unless there is a clear function of the strain [49]. Originally, the entatic state and the induced rack theories for the blue copper proteins discussed only the rigid protein and the strained cupric conformation, i.e. mechanical strain. However, lately they have started to embrace virtually any modifying effect of the protein. For example, in a recent commentary [63], Gray, Malmstrtm and Williams consider exclusion of water as a "constraining factor". This is a most unfortunate widening and blurting of the concept, making discussions harder. Moreover, seen in that way, all proteins are strained or entatic (i.e. they are adapted to functional advantage [63]), but at the same time such a hypothesis becomes void of any predictive value. With Warshel's or our definition of strain, we have shown without any doubts that the cupric structure of the blue copper proteins is not strained to any signifi-
44 cant degree [14,33,34], especially if the dynamics at ambient temperatures and the dielectric surroundings of the copper site are considered [26,35,45,54]. The electronic structure explains why protein sites with a cysteine ligand have structures close to a tetrahedron, whereas inorganic complexes are tetragonal [65]. Furthermore, our and other groups have shown that the unusual spectroscopic properties and the high reduction potential of the blue copper proteins are a natural consequence of the covalent nature of the bond between copper and the cysteine thiolate group [33-36,65,83-86,96,119,216]. Similarly, we have shown that the low reorganisation energy is also intrinsic to the blue copper site [26,68]. Clearly, strain is not needed to explain any of the unusual properties of the blue copper proteins and there is no indication that mechanical strain has any functional value for the proteins. The similarity in structure between the oxidised and reduced forms of the blue copper proteins has often been taken as an argument for the strain hypotheses [63]. However, our results show that this is a natural effect of the copper ligands, especially the cysteine ligand [14]. Likewise, the similarity between metalsubstituted blue copper proteins and their native counterparts has been taken as an argument for strain [63,217]. Yet, there is an appreciable variation in the metal-ligand distances for the various proteins, viz. 43, 31, 102, and 103 pm for the bonds to N, Scys, O, and SMr respectively [63]. This points to a plastic, rather than rigid, metal site. Moreover, the changes reflect the softness of the metal, showing that the metal, rather than the protein, determines the geometry of the site. Similarly, in trans mutations of the copper ligands in azurin have provided strong experimental evidence for a flexible copper site [218]. The fact that the structure of the copper-flee form of the blue copper proteins is similar to that of the metal-loaded form has also frequently been taken as an argument for strain. However, this does not show that the copper site is strained. Instead, it may facilitate metal binding [14,77]; if the metal chelating site was not present before the metal is bond, clearly metal binding would be harder [14,144]. Moreover, the copper-flee structure is stabilised by several favourable hydrogen bonds [219-221], showing that the structure is not unnatural. In fact, there is another structure of the apo-azurin [57], in which a water molecule occupies the metal site, leading to appreciable changes in the geometry of the site. Again, this points to a substantial flexibility of the metal site. A fourth argument for the strain hypotheses is the problem to synthesise small inorganic models that reproduce the geometry and macroscopic properties of the blue copper proteins [66,67]. The most successful models involve strained ligands [222] and the first trigonal model was reported only very recently [223]. However, inorganic modelling of blue-copper sites is full of practical problems [224]. Most prominently, Cu(II) and thiolate ligands tend to disproportionate to form Cu(I) and disulphide. In the proteins, this reaction is inhibited by the bulk of the protein. Second, our calculations show that the stability of trigonal and
45 tetragonal Cu(II) complexes depends strongly on the ligands. A thiolate ligand is not enough to stabilise a trigonal structure; another weak ligand, such as methionine must also be present [65]. In fact, there are still no small inorganic model that have the same set of ligands (N2S-S ~ as the blue copper proteins [66,67]. Another problem with small models is that molecules from the solution (e.g. water) may come in and stabilise tetragonal structures and higher coordination numbers [224]. It is illustrative that very few inorganic complexes reproduce the properties of the blue copper proteins [66,67], whereas typical blue-copper sites have been constructed in several proteins and peptides by metal substitution, e.g. insulin, alcohol dehydrogenase, and superoxide dismutase [66]. This shows that the problem is more related to protection from water and dimer formation than to strain. This does not mean that the protein is unimportant for the function of the blue copper proteins. On the contrary, the protein provides the proper ligands to the copper site and protects it from unwanted ligands. This may also involve a restriction of the number of ligands of the copper ion. Typically, Cu(II) binds 4--6 ligands, whereas Cu(I) prefers 2-4, but with the bulky, soft, and negatively charged sulphur ligands, the two oxidation states accept the same coordination number. Second, the protein modifies the dielectric properties of the surroundings of the copper site, thereby reducing the outer-sphere reorganisation energy and modulating the reduction potential of the copper site. Third, the protein offers a proper path or matrix for electron transfer and the docking sites for the donor and acceptor proteins [144]. Clearly, the blue copper proteins also modulate the geometry of the copper site. The rhombic type 1 proteins stabilise a tetragonal structure, whereas the axial type 1 proteins stabilise the trigonal structure of the same copper coordination sphere. However, the energy needed for such a stabilisation, <7 kJ/mole [34], is less than the typical distortion energies occurring in all proteins due to the subtle mismatch between the protein and the ligand sphere [57,75-77,215]. Furthermore, the forces leading to such a stabilisation include electrostatics and other factors usually not defined as mechanical strain, and the functional value of this stabilisation is unclear since both types of sites are present in proteins with a similar function. In conclusion, we have in a series of investigations addressed the function and properties of the blue copper proteins. We have emphasised the in~ortance of defining what is meant by the strain and to discuss strain in quantitative terms. For such an investigation, theoretical methods seem to be better suited than experiments, since they directly give the energy and allow the energy to be divided into contributions from different degrees of freedom. We have in no case found any indication of a functional role for strain in the blue copper proteins. On the contrary, copper complexes in vacuum seem to have mostly the same properties as in the proteins. The proteins have constructed a metal coordinating site which
46 minimises the electron-transfer reorganisation energy by an appropriate choice of metal ligands, viz. ligands that are a compromise between those preferred by the Cu(I) and Cu(II) ions. In particular, the cysteine ligand is crucial, giving rise to rather tetrahedral structures (either trigonal or tetragonal). Moreover, the methionine ligand gives a very flexible bond, which can change considerably at a small expense of energy. Thus, in our eyes, strain hypotheses for the blue copper proteins are a case for Ockham's razor.
9. C O N C L U D I N G R E M A R K S
We have in recent years seen an exciting development of theoretical methods for the use in biochemistry and this trend can be expected to continue with an increasing pace in the futm'e. The background is the explosive increase in computer power and development of methods and software that can deal with large molecules at reasonable costs. Density functional methods, in particular, have become a viable tool for studies of ground-state properties. We have used them to obtain geometries. The geometries have been calibrated by comparisons with accurate crystal structures of model complexes and EXAFS data. Typically, metal distances to sulphur are --6 pm too long, whereas distances to histidine ligands are --4 pm too long. Such an accuracy is better than what is normally obtained by protein crystallography [179]. Moreover, errors in the calculated structm'es are systematic, so differences in geometries are accurately reproduced (within 1 pm). Naturally, geometry optimisations of small models in vacuum necessarily ignore effects of the surroundings, e.g. hydrogen bonds, hydrophobic interactions, and steric effects, which determine torsion angles of the complexes in the protein, and also may determine the length of weak axial bonds, such as the Cu-SMet bond. Therefore, geometry optimisation methods, which include the full detail of the surroundings [26,60], or even the crystallographic raw data [225], are most promising, even if they are more demanding. Multiconfigurational second-order perturbation theory (CASPT2) has been used to obtain accurate energies and to study electronic spectra of the proteins. The results are surprisingly accurate and have been used to explore the relation between structural and spectroscopic properties. However, one should bear in mind all uncertainties inherent in the comparison of the theoretical and experimental data. The CASPT2 method itself has a well-documented uncertainty in computed excitation energies of up to 2 500 cm-1 [28]. To this should be added deficiencies in the basis set and the effect of the protein and solvent, which was described by a rather primitive point charge model. The significant influence of the protein on the spectroscopic properties is interesting and calls for a more accurate investigation. Preliminary results indicate that most of the effect comes from a few atoms near Scys. Thus, the effect strongly depends on the charge of
47 these atoms. This means that it would be enough to improve the model of a few residues near to the copper site. These residues may, for example, be modelled by higher electrostatic moments calculated at the actual conformation and environment in the protein. Thus, it is clear that the surrounding protein and solvent have an important influence on the structure and spectra of the copper sites. For reduction potentials and outer-sphere reorganisation energies, these effects are of the same magnitude as the electronic effects and therefore essential. We have used three different levels of approximations for these effects: an array of point charges, continuum models, and combined quantum chemical and classical simulations. We anticipate that this will be an area of intensive development in the future. Several quantifies are easier to estimate by theoretical methods than by experiments. For example, it is hard by experiments to unambiguously decide whether a bound ligand is strained or not by a protein. With theoretical methods, it is quite simple, since we can obtain the optimal structure of the isolated ligand, and therefore directly estimate the effect of strain (in geometry or energy) by comparing the structure and energy with the structure in the protein. We can even estimate the free energy cost for the protein to bind a certain ligand [54]. Generally speaking, it is often easier to obtain energies by theoretical methods than by experiments. This is a great advantage, since chemical processes are governed by energies. Therefore, theoretical methods can directly study reaction mechanisms, by obtaining reaction and activation energies, whereas experimental investigations mostly give indirect evidence that must be interpreted in terms of structures and energies. Moreover, technical problems, e.g. if the molecule of interest is spectroscopically invisible, short-lived, or hazardous, provide no hinder for theoretical chemistry; any molecule can be studied in the computer as long as you wish. An illustrative example of the advantage of theoretical methods is the Cu-SMe t bond in blue copper proteins. It is the only bond that shows a clear variation among different crystal structure (i.e. an experimentally discernible variation). Therefore, it has been suggested to be important for the geometry and reduction potential of the proteins [13,112,116,118,119]. However, a large variation does not necessarily imply a functional importance. Our calculations show that it is instead caused by the flat potential surface (a small force constant) [35]. Thus, the large variation in bond length corresponds to a small variation in energy, and therefore the bond has a minor influence on the structure and function of the copper site. In conclusion, we have shown that theoretical calculations can be used to successfully solve biochemical problems. In similarity with experimental methods, they involve assumptions and interpretation, and they have their limitations, but there are many problems that are best studied by theory. Thus, theoretical meth-
48
ods have become a competitive alternative to experiments for biochemical investigations.
REFERENCES lo
2. 3. 4. 5. .
7. .
.
10. 11. 12. 13. 14. 15.
16. 17.
18. 19. 20. 21. 22. 23. 24.
A.G. Sykes, Adv. Inorg. Chem. 36 (1990) 377. T. Adman, Adv. Prot. Chem. 42 (1991) 145. A. Messerschmidt, Struct. Bond. 90 (1998) 37. J.M. Guss, H.D. Bartunik & H.C. Freeman, Acta Cryst. B48 (1992) 790. W.E.B. Shepard, B.F. Anderson, D.A. Lewandoski, G.E. Norris & E.N. Baker J. Am. Chem. Soc. 112 (1990) 7817. R.A. Marcus & N. Sutin, Biochim. Biophys. Acta 811 (1985) 265. R.J.P. Williams (1963) in Molecular basis of enzyme action and inhibition (P.A.E. Desnuelle, ed), p. 133, Pergamon Press, Oxford. B.G. MalmstrOm (1965) in Oxidases and related redox systems (T.E. King, H.S. Mason, M. Morrison, eds.), vol. 1, p. 207, Wiley, New York. B.L. Vallee & R.J.P. Williams, Proc. Natl. Acad. Sci. USA 59 (1968) 498. R.J.P. Williams Eur. J. Biochem. 234 (1995) 363. H.B. Gray & B.G. Malmstr6m, Comments Inorg. Chem. 2 (1983) 203. Malmstr6m BG (1994) Eur. J. Biochem. 223 711. J.A. Guckert, M.D. Lowery & E.I. Solomon, J. Am. Chem. Soc. 117 (1995) 2817. U. Ryde, M.H.M. Olsson, K. Pierloot, B.O. Roos, J. Mol. Biol. 261 (1996) 586. J.E. Rice, H. Horn, B.H. Lengsfiels, A.D. McLean, J.T. Carter, E.S. Replogle, L.A. Barnes, S.A. Maluendes, G.C. Lie, M. Gutwski, W.E. Rude, S.P.A. Sauer, R. Lindh, K. Andersson, T.S. Chevalier, P.-O.Widmark, D. Bouzida, G. Pacansky, K. Singh, C.J. Gillan, P. Carnevali, W.C. Swope & B. Liu (1995). Mulliken TM Version 2.25b, internal release, IBM Corporation, Almaden, USA. D. Treutler & R. Ahlrichs J. Chem. Phys. 102 (1995) 346. M.J. Frisch, G.W. Trucks, H.B. Schlegel, G.E. Scuseria, M.A. Robb J.R. Cheeseman, V.G. Zakrzewski J.A. Montgomery R.E. Stratmann, J.C. Burant, S. Dapprich, J.M. Millam, A.D. Daniels, K.N. Knudin, M.C, Strain, O. Farkas, J. Tomasi, V. Barone, M. Cossi, R. Cammi, B. Mennucci, C. Pomelli, C. Adamo, S. Clifford, J. Ochterski, G.A. Petersson, P.Y. Ayala, Q. Cui, K . , Morokuma, D.K. Malick, A.D. Rabuck, K. Raghavachari, J.B. Foresman, J. Cioslowski, J.V. Ortiz, B.B. Stefanov, G. Liu, A. Liashenko, P. Piskorz, I. Komaromi, R.Gomperts, R.L. Martin, D.J. Fox, T. Keith, M.A. A1-Laham, C.Y. Peng, A. Nanayakkara, C. Gonzalez, M. Challacombe, P.M.W. Gill, B.G. Johnson, W. Chen, M.W. Wong, J.L. Andres, M. Head-Gordon, E.S. Replogle & J.A. Pople 1998. Gaussian 98, Revision A.5, Gaussian, Inc, Pittsburgh PA. R.H. Hertwig & W. Koch, Chem. Phys. Lett. 268 (1997) 345. A. Ricca & C.W. Bauschlicher, J. Phys. Chem. 98 (1994) 12899. A. Ricca & C. W. Bauschlicher, Theor. Chim. Acta 92 (1995) 123. M. C. Holthausen, M. Mohr & W. Koch, Chem. Phys. Lett. 240 (1995) 245. C.W. Bauschlicher, Chem. Phys. Lett. 246 (1995) 40. A. Schafer, H. Horn & R.J. Ahlrichs, Chem. Phys. 97 (1992) 2571. E. Sigfridsson, M.H.M. Olsson & U. Ryde (2000) "A comparison of the inner-sphere reorganisation energies of cytochromes, iron-sulphur clusters, and blue copper proteins", J. Mol. Biol. submitted.
49
25. 26. 27. 28.
29. 30. 31. 32.
33. 34. 35. 36. 37. 38.
39.
40.
41. 42. 43. 44. 45. 46. 47. 48. 49.
W.J. Hehre, L. Radom, P.v.R. Schleyer & J.A. Pople, (1986) Ab initio molecular orbital theory, Wiley-Interscience, New York. U. Ryde & M.H.M. Olsson (2000) "Accurate geometry optimisations of blue-copper models in vacuum, solvent, and the protein", J. Phys. Chem., submitted. K. Andersson, P.-A. Malmqvist, B.O. Roos, J. Chem. Phys. 96 (1992) 1218. B.O. Roos, K. Andersson, M.P. Ftilscher, P.-]k. Malmqvist, L. Serrano-Andr6s, K. Pierloot & M. Merch~in, in Advances in chemical physics New methods in coputatioal quantum mechanics (I. Prigogine & S.A. Rice eds.) 43 (1996) 219, John Wiley & Sons, New York. K. Pierloot, B. Dumez, P.-O. Widmark & B.O. Roos, Theoret. Chim. Acta 90 (1995) 87. B.O. Roos, K. Andersson, M.P. Ftilscher, L. Serrano-Andr6s, K. Pierloot, M. Merchgm & V. Molina, J. Mol. Struct. 388 (1996) 257. P-/~. Malmqvist & B.O. Roos, Chem. Phys. Lett, 245 (1989) 189. K. Andersson, M.R.A. Blomberg, M.P. Ftilscher, G. Karlstr0m, R. Lindh, P.-/~. Malmqvist, P. Neogr~idy, J. Olsen, B.O. Roos, A.J. Sadlej, M. SchUtz, L. Seijo, L. Serrano-Andr6s, P.E.M. Sigebahn & P.-O. Widmark, (1997) MOLCAS version 4. University of Lund, Sweden. J.O.A. De Kerpel, K. Pierloot, U. Ryde & B.O. Roos, J. Phys. Chem. B 102 (1998) 4638. K. Pierloot, J.O.A. De Kerpel, U. Ryde, M.H.M. Olsson, B.O. Roos J. Am. Chem. Soc. 120 (1998) 13156. M.H.M. Olsson &U. Ryde, J. Biol. Inorg. Chem. 4 (1999) 654. K. Pierloot, J.O.A. De Kerpel, U. Ryde & B.O. Roos J. Am. Chem. Soc. 119 (1997) 218. J.O.A. De Kerpel, K. Pierloot & U. Ryde J. Phys. Chem. B 103 (1999) 8375. U. Ryde, M.H.M. Olsson, Roos BO, K. Pierloot & J.O.A. De Kerpel (1998) in The Encyclypaedia of Computational Chemistry, P.v.R. Schleyer, N.L. Allinger, T. Clark, J. Gasteiger, P.A. Kollman, H.F. Schaefer III & P.R. Schreiner (eds), John Wiley & Sons, Chichester, p. 2255. A. Borin-Carlos, M.H.M. Olsson, U. Ryde, B.O. Roos, E. Cedergren-Zeppesauer & A. Merli (2000) "A theoretical study of the structure and electronic spectrum of Cusubstituted alcohol dehydrogenase", J. Am. Chem. Soc. submitted. D.A. Case, D.A. Pearlman, J.W. Cadwell, T.E. Cheatham III, W.S. Ross, C.L. Simmering, T.A. Darden, K.M. Merz, R.V. Stanton, A. L. Cheng, J.J. Vincent, M. Crowley, D.M. Ferguson, R.J. Radmer, G.L. Seibel, U.C. Sing, P.K. Weinter, P.A. Kollman (1997) Amber 5.0, University of California, San Francisco. J. Tomasi & M. Persico, Chem. Rev. (1994) 2027. F.M. Floris, J. Tomasi & J.L. Pascal-Ahuir, J. Comput. Chem. 12 (1991) 784. V. Barone & M. Cossi, J. Phys. Chem. A 102 (1998) 1995. J.B. Foresman, T.A. Keith, K.B. Wiberg, J. Snoonian & M.J. Frisch, J. Chem. Phys. 100 (1996) 16098. M.H.M. Olsson & U. Ryde (2000) "A theoretical study of the reduction potential of blue copper proteins", manuscript in preparation. E. Sigfridsson & U. Ryde, J. Comput. Chem. 19 (1998) 377. B. Honig, Science 268 (1995) 1144. D. Bachford, Lecture notes ha Computer Science 1343 (1997) 233. U. Ryde, M.H.M. Olsson, B.O Roos, J.O.A. De Kerpel & K. Pierloot (2000) "On the role of strain in blue copper proteins", J. Biol. Inorg. Chem, in press.
50
50.
A. Warshel (1991) Computer modelling of chemical reactions in enzymes and solutions, p. 209, J. Wiley, Sons, New York. 51. A.K. Churg, R.M. Weiss, A. Warshel & T. Takano, J. Phys. Chem. 87 (1983) 1683. 52. P.J. Stephens, D.R. Jollie & A. Warshel, Chem. Rev. 96 (1996) 2491. 53. C.A.P. Libeu, M. Kukimoto, M. Nishiyama, S. Hornouchi & E.T. Adman, Biochemistry, 36 (1997) 13160. 54. J.O.A. De Kerplel & U. Ryde, Prot. Struct, Funct, Genet, 36 (1998) 157. 55. A. Warshel & M. Karplus, J. Am. Chem. Soc. 94 (1972) 5612. 56. U.C. Singh & P.A. Kollman, J. Comp. Chem.7 (1986) 718. 57. U. Ryde, J. Comp.-Aided. Mol. Design. 10 (1996) 153. 58. M. Svensson, S. Humbel, R.D.J. Froese, T. Matsubara, S. Sieber, K. Morokuma, J. Phys. Chem. 100 (1996) 19357. 59. K.P. Eurenius, D.C. Chatfield, B.R. Brooks, Int. J. Quant. Chem. 60 (1996) 1189. 60. M.L. Field (1998) in the encyclypaedia of computational chemistry, P.v.R. Schleyer, N.L. Allinger, T. Clark, J. Gasteiger, P.A. KoUman, H.F. Schaefer III & P.R. Schreiner (eds.), John Wiley & Sons, Chichester 2255. 61. U. Ryde, J.O.A. De Kerpel, K. Pierloot & M.H.M. Olsson (2000) 'q'he structure, spectroscopy, and reorganisation energy of azurin", manuscript in preparation. 62. N. Bonander, T. V~inng~d, L.-C. Tsai, V. Langer, H. Nar & L. SjOlin, Proteins, Struct. Funct. Genet. 27 (1997) 385. 63. H.B. Gray, R.J.P. Wilfiams & B.G. MalmstrOm (2000) "Copper coordination in blue proteins", J. Biol. Inorg. Chem., in press. 64. F.A. Cotton & G. Wilkinson (1988) Advanced inorganic chemistry, Wiley, New York. 65. M.H.M. Olsson, U. Ryde, B.O. Roos & K. Pierloot, J. Biol. Inorg. Chem. 3 (1998) 109. 66. N. Kitajima, Adv. Inorg. Chem, 39 (1992) 1. 67. S. Mandal, G. Das, R. Singh, R. Shukla & P.K. Bharadwaj, Coord. Chem. Rev 160 (1997) 191. 68. M.H.M. Olsson, U. Ryde & B.O. Roos, Prot. Sci. 7 (1998) 2659. 69. S. A1-Karadaghi, E. Cedergren-Zeppesauer, Z. Dauter & K.S. Wilson, Acta Crystallogr. D51 (1995) 805. 70. A. Schafer, C. Huber & R. Ahlrichs, J. Chem. Phys, 100 (1994) 5829. 71. A.D. Becke, Phys. Rev. A 38 (1988) 3098. 72. J.P. Perdew, Phys. Rev. B 33 (1986) 8822. 73. K. Eichkorn O. Treutler, H. Ohm, M. H/iser & R. Ahlrihs, Chem. Phys. Lett. 240 (1995) 283. 74. J.M. Guss, P.R. Harrowell, M. Murata, V.A. Norris & H.C. Freeman, J. Mol. Biol. 192 (1986) 361. 75. U. Ryde, Prot. Sci. 4 (1995) 1124. 76. U. Ryde, Eur. J. Biophys. 24 (1996) 213. 77. U. Ryde & L. Hemmingsen, J. Biol. Inorg. Chem. 2 (1997) 567. 78. A.B.P. Lever, Inorganic electronic spectroscopy (1984), pp. 58-66, Elsevier, Amsterdam. 79. J. Han, T.M. Loehr, Y. Lu, J.S. Valentine, B.A. AveriU & J. Sanders-Loehr, J. Am. Chem. Soc. 115 (1993) 4256. 80. Y. Lu, J.A. Roe, E.B. Gralla & J.S. Valentine, in Bioinorganic chemistry of copper, eds. K.D. Karlin & Z. Tyekl~, Chapman & Hall, New York (1993), p. 64. 81. C.R. Andrew, H. Yeom, J.S. Valentine, B.G. Karlsson, N. Bonander, G. van Pouderoyen, G.W. Canter, T. M. Loehr & J. Sanders-Loehr, J. Am. Chem. Soc. 116 (1994) 11489.
51
82. 83. 84. 85. 86. 87. 88. 89. 90. 91. 92. 93. 94. 95. 96. 97. 98. 99. 100. 101. 102. 103. 104. 105. 106. 107. 108. 109. 110. 111. 112. 113. 114. 115. 116.
S.J. Kroes, C.W.G. Hoitink, C.R. Andrew, J.Y. Ai, J. Sanders-Loehr, A. Messerschmidt, W.R. Hagen & G.W. Canters, Eur. J. Biochem. 240 (1996) 342. K.W. Penfield, A.A. Gewirth & E.I. Solomon, J. Am. Chem. Soc. 107 (1985) 4519. A.A. Gewirth & E.I. Solomon, J. Am. Chem. Soc. 110 (1988) 3811. L.B. LaCroix, S.E. Shadle, Y. Wang, B.A. Averill, B. Hedman, K.O. Hodgson, E.I. Solomon, J. Am. Chem. Soc. 118 (1996) 7755. L.B. LaCroix, .D.W. Randall, A.M. Nersissian, C.W.G. Hoitink, G.W. Canters, J.S. Valentine & E.I. Solomon, J. Am. Chem. Soc. 120 (1998) 9621. J. Sanders-Loehr, in Bioinorganic chemistry of copper, eds. K.D. Karlin & Z. Tyekl(tr, Chapman & Hall, New York (1993), pp. 51. E.T. Adman, J.W.Godden & S. Turley, J. Biol. Chem. 270 (1995) 27458. K.K. Stavrev & M. Zerner, Int. J. Quant. Chem. Quantum. Biol. Symp. 22 (1995) 155. B.S. Brunschwig, S. Ehrenson & N. Sutin, J. Phys. Chem. 91 (1987) 4714. Z. Zhou & S.U.M. Kahn, J. Phys. Chem. 93 (1989) 5292. G. King & A. Warshel, J. Chem. Phys. 93 (1990) 8682. C. Zheng, J.A. McCammon & P.G. Wolynes, Chem. Phys. 158 (1991) 261. Y. Bu, S. Liu & X. Song, Chem. Phys. Lett, 227 (1994) 121. H.-X. Zhou, J. Am. Chem. Soc. 116 (1994) 10362. S. Larsson, A. Broo & L. SjOlin, J. Phys. Chem. 99 (1995) 4860. I. Muegge, P.X. Qi, A.J. Wand, Z.T. Chu & A. Warshel, J. Phys. Chem. B 101 (1997) 825. Y. Bu, Y. Ding, F. He, L. Jiang & X. Song, Internat. J. Quant. Chem. 61 (1997) 117. G.M. Soriano, W.A. Cramer & L.I. Krishtalik, Biopys J. 73 (1997) 265. K.A. Sharp, Biophys. J. 73 (1998) 1241. Y.I. Kharkats & J. Ulstrup, Chem. Phys. Lett. 202 (1999) 320. K. Sigfridsson, M. Sundahl, M.J. Bjerrum & O. Hansson, J. Biol. Inorg. Chem. 1 (1996) 405. O. Farver & I. Pecht, Biophys Chem. 50 (1994) 203. O. Farver, L.K. Skov, G. Gilardi, G. van Puderoyden & G.W. Canters, Chem. Phys. 204 (1996) 271. J.R. Winkler, P. Wittung-Stafshede, J. Leckner, B.G. Malmstr0m & H.B. Gray, Proc. Natl. Acad. Sci. USA 94 (1997) 4246. A.J. Di Bilio, M.G. Hill, N. Bonander, B.G. Karlsson, R.M. Villahermosa, B.G. MalmstrOm & H.B. Gray, J. Am. Chem. Soc. 119 (1997) 9921. L.K. Skov, T. Pascher, J.R. Winkler & H.B. Gray, J. Am. Chem. Soc. 120 (1998) 1102. F. Drepper, M. Hippler, W. Nitschke & W. Haehnel, Biochemistry 35 (1996) 1282. G.R. Loppnow & E. Fraga, J. Am. Chem. Soc. 199 (1997) 896. I. Zaitseva, V. Zaitsev, G. Card, K. Moshkov, B. Bax, A. Ralph & P. Lindley, J. Biol. Inorg. Chem. 1 (1996) 15. T.E. Machonkin, H. H. Zhang, B. Hedman, K.O. Hodgson & E.I. Solomon, Biochem. 37 (1998) 9570. R.H. Holm, P. Kennepohl & E.I. Solomon, Chem. Rev 96 (1996) 2239. J.J.R. Frafisto da Silvia & R.P.J. Williams, The biological chemistry of the elements, Clarendon Press, Oxford, 1994. H.-X. Zhou, J. Biol. Inorg. Chem. 2 (1997) 109. B.R. James & R.P.J. Williams, J. Chem. Soc. (1961) 2007. J. Leckner, P. Wittung, N. Bonander, B.G. Karlsson & B.G. Malmstr0m, J. Biol. Inorg. Chem. 2 (1997) 368.
52
117. P. Wittung-Stafshede, M.G. Hill, E. Gomez, A.J. Di Bilio, B.G. Karlsson, J. Leckner, J.G. Winkler, H.B. Gray & B.G. MalmstrOm, J. Biol. Inorg. Chem. 3 (1998) 367. 118. B.G. Malmstr6m & J. ~ k n e r , Curr. Op. Chem. Biol. 2 (1998) 286. 119. E.I. Solomon, K.W. Penfield, A.A. Gewirth, M.D. Lowery, S.E. Shadle, J.A. Guckert & L.B. Lacroix, Inorg. Chim Acta 243 (1996) 67. 120. T. Pascher, G. KarlstrOm, M. Nordling, B.G. MalmstrOm & T. V~nng~d, Eur. J. Biochem. 212 (1993) 289. 121. B.G. Karlsson, L.-C. Tsai, H. Nar, J. Sanders-Loehr, N. Bonander, V. Langer & L. SjOlin, Biochemistry 36 (1997) 4089. 122. E.R. Dockal, T.E. Jones, W.F. Sokol, R.J. Engerer, D.B. Rorabacher & L.A. Ochrymowycz, J. Am. Chem. Soc. 98 (1976) 4322. 123. K.A. Sharp, Annu. Rev. Biophys. Biophys. Chem. 19 (1990) 301. 124. K.K. Rodgers & S.G. Sligar, J. Am. Chem. Soc. 113 (1991) 9419. 125. F.A. Tezcan, J.R. Winkler & H.B. Gray, J. Am. Chem. Soc. 120 (1998) 13383. 126. M.V. Botuyan, A. Toy-Palmer, J. Chung, R. C. Blake, P. Beroza, D.A. Case & H.J. Dyson, J. Mol. BioL 263 (1996) 752. 127. G.T. Babcock & M. WikstrOm, Nature 356 (1992) 301. 128. S. Iwata, C. Ostermeier, B. Ludwig & H. Michel, Nature 376 (1995) 660. 129. M. Wilmanns, P. Lappalainen, M. Kelly, E. Sauer-Eriksson & M. Saraste, Proc. Natl. Acad. Sci. 92 (1995) 11955. 130. T. Tsukihara, H. Aoyama, E. Yamashita, T. Tomizaki, H. Yamaguchi, K. Shinzawa-Itoh, R. Nakashima, R. Yaono & S. Yoshikawa, Science 269 (1995) 1069. 131. B.G. MalmstrOm & R. Aasa, Eur. J. Biochem. 325 (1993) 49. 132. H. Beinert, Eur. J. Biochem. 245 (1997) 521. 133. N.J. Blackburn, S. de Vries, M.E. Barr, R.P. Houser, W.B. Tolman, D. Sanders & J.A. Fee, J. Am. Chem. Soc. 119 (1997) 6135. 134. N.J. Blackburn,, M.E. Barr, W.H. Woodruff, J. van der Oost & S. de Vries, Biochem. 33 (1994) 10401. 135. S.E. Wallace-Williams, C.A. James, S. de Vries, M. Saraste, P. Lappalainen, J. van der Oost, M. Fabian, G. Palmer & W.H. Woodruff, J. Am. Chem. Soc. 118 (1996) 3986. 136. D.R. Gamelin, D.W. Randall, M.T. Hay, R.P. Houser, T.C. Mulder, G.W. Canters, S. de Vries, W.B. Tolman, Y. Lu & E.I. Solomon J. Am. Chem. Soc. 120 (1998) 5246. 137. P.M.H. Kroneck, W.E. Antholine, D.H.W. Kastrau, G. Buse, G.C.M. Steffens & W.G. Zumft, FEBS Lett. 268 (1990) 274. 138. F. Neese, W.G. Zumft, W.E. Antholine & P.M.H. Kroneck, J. Am. Chem. Soc. 118 (1996) 8692. 139. J.A. Farrar, F. Neese, P. Lappalainen, P.M.H. Kroneck, M. Saraste, W.G. Zumft & A.J. Thomson, J. Am. Chem. Soc. 118 (1996) 11501. 140. M. Karpefors, C.E. Slutter, J.A. Fee, R. Aasa, B. K~illebring, S. Larsson & T. V~tnng~d, Biophys. J. 71 (1996) 2823. 141. K.R. Williams, D.R. Gmelin, L.B. LaCroix, R.P. Houser, W.B. Tolman, T.C. Mulder, S. de Vries, B. Hedman, K.O. Hodgson & E.I. Solomon, J. Am. Chem. Soc. 119 (1997) 613. 142. J.A. Farrar, R. Gfinter, F. Neese, J. Nelson & W.H. Thompson, J. Chem. Soc., Datlon. Trans. (1997) 4083. 143. F. Neese, R. Kappl, J. Htittermann, W.G. Zumft & P.M.H. Kroneck, J. Biol. Inorg. Chem. 3 (1998) 53. 144. S. Larsson (2000) "Evolution of energy saving electron pathways in proteins" J. Biol. Inorg. Chem. in press.
53
145. R.P. Houser, V.G. Young & W.B. Tolman, J. Am. Chem. Soc. 118 (1996) 2101. 146. J.A. Farrar, W.G. Zumft & A.J. Thomson, Proc. Natl. Acad. Sci. USA 95 (1998) 9891. J.A. Farrar, W.G. Zumft & A.J. Thomson, "Dimeric Copper Centres - from CuA to Cuz" J. Biol. Inorg. Chem. (2000) in press. 147. M.H.M. Olsson & U. Ryde (2000) "Geometry, reduction potential, and reorganisation energy of the binuclear CUA site studied by theoretical methods", Proc. Natl. Acad. Sci. USA, submitted. 148. C. Ostermeier, A. Harrenga, U. Ermler & H. Michel, Proc. Natl. Acad. Sci. USA 94 (1997) 10547. 149. S. Yoshikawa, K. Shinzawa-Itoh, R. Nakashima, R. Yaono, E. Yamashita, N. Inoue, M.J. Fei, C.P. Libeu, T. Mizushima, H. Yamaguchi, T. Tomizaki & T. Tsukihara, Science 280 (1998) 280. 150. P.A. Williams, N.J. Blackburn, D. Sanders, H. Bellamy, E.A. Stura, J.A. Fee & D.A. McRee, Nature Struct. Biol. 6 (1999) 509. 151. G. Henkel, A. Mtiller,S. Weissgr~iber, G. Buse, T. Soulimane, G.C.M. Steffens & H.-F. Nolting, Angew. Chem. Int. Ed. 34 (1995) 1489. 152. R.P. Houser, J.A. Halfen, V.G. Young, N.J. Blackburn & W.B. Tolman, J. Am. Chem. Soc. 117 (1995) 10745. 153. J.A. Farrar, P. Lappalainen, W.G. Zumft, M. Saraste & A.J. Thomson, Eur. J. Biochem. 232 (1995) 303. 154. M. Kelly, P. Lappalainen, G. Talbo, T. Halitia, J. van der Oost & M. Saraste, J. Biol. Chem. 268 (1993) 16781. 155. D.W.Randall, D.R. Gamelin, L.B. LaCroix & E.I. Solomon, J. Biol. Inorg. Chem. "Electronic structure contributions to electron transfer in blue copper proteins and Cua, in press. 156. S. Larsson, B. Kallebring, P. Wittung & B.G. Malmstr0m, Proc. Natl. Acad. Sci. USA 92 (1995) 7167. 157. B.E. Ramirez, B.G. Malmstr0m, R.J. Winkler & H.B. Gray, Proc. Natl. Acad. Sci. USA 92 (1995) 11949. 158. P. Brzezinkski, Biochem. 35 (1996) 5611. 159. K.R. Hoke, C.N. Kiser, A.J. di Bilio, J.R. Winkler, J.H. Richards & H.B. Gray, J. Inorg. Biochem. 74 (1999) 165. 160. J.A. Farrar, W.G. Zumft & A.J. Thomson, Proc. Natl. Acad. Sci. USA 95 (1998) 9891. 161. J.A. Cowan, Inorganic biochemistry, an introduction, Wiley-VCH, New York, 1997. 162. S.J. Lippard & J.M. Berg, Principles of bioinorganic chemistry (1994) University Science Books, Mill Valley. 163. G. Palmer & J. Reedijk, Eur. J. Biochem. 200(1991) 599. 164. A.K. Churg & A. Warshel, Biochemistry 25 (1986) 1675. 165. Barkigia, K. M. J. Am. Chem. Soc. 110 (1988) 7566. 166. K.K.Rodgers & S.G. Sligar, J. Am. Chem. Soc. 113 (1991) 9419. 167. M.R. Gunner & B. Honig, Proc. Natl. Acad. Sci. USA 88 (1991) 9151. 168. R. Langen, Warshel, A. J. Mol. Biol. 224 (1992) 589. 169. M.R. Gunner, E. Alexov, E. Torres & S. Lipovaca, J. Inorg. Biol. Chem. 2 (1997) 126. 170. A. Warshel, J. Biol. Inorg. Chem. 2 (1997) 143. 171. P.J. Martel, J. Biol. Inorg. Chem. 4 (1999) 73. 172. R.K. Gupta, Biochim. Biophys. Acta 292 (1973) 291. 173. D.G. Nocera, J.R. Winkler, K.M. Yocom, E. Bordignon & H.B. Gray, J. Am. Chem. Soc. 106 (1984) 5145.
54
174. D.W. Dixon, X. Hong, S.E. Woehler, A.G. Mauk & B.P. Sishta, J. Am. Chem. Soc. 112 (1990) 1082. 175. F. Frolow, A.J. Smith, J.R. Guest & P.M. Harrison, Nature, Struct. Biol. 1 (1994) 453. 176. S.C. Baker, N.F.W. Saunders, A.C. Willis, S.J. Ferguson, J. Hajdu & V. Ftilt~p, J. Mol. Biol. 269 (1997) 440. 177. W.R. Scheidt, Acc. Chem. Res. 10 (1977) 339. 178. W.R. Scheidt & C.A. Reed, Chem. Rev. 81 (1981) 543. 179. D.W. Cruickshank, Acta Crystallogr. D55 (1999) 583. 180. J.E. Newton & M.B. Hall, Inorg. Chem. 23 (1996) 4627. 181. H. Beinert, R.H. Holm & E. Mtinck, Science 277 (1997) 653. 182. J.G. Norman & S.C. Jackels, J. Am. Chem. Soc. 97 (1975) 3833. 183. R.A. Bair & W.A. Goddard, J. Am. Chem. Soc. 100 (1978) 5669. 184. G.M. Jensen, A. Warshel, P.J. Stephens, Biochem. 33 (1994) 10911. 185. R.P. Christen, S.I. Spyros & E.T. Smith, J. Biol. Inorg. Chem. 1 (1996) 515. 186. P.D. Swartz, B.W. Beck & T. Ichiye, Biophys. J. 71 (1996) 2958. 187. P.D. Swartz & T. Ichiye, Biophys. J. 73 (1997) 2733. 188. I. Bertini, J. Biol. Inorg. Chem. 2 (1997) 114. 189. K.K. Stavrev, Int. J. Quant. Chem. 63 (1997) 781. 190. E.L. Bominaar, C. Achim, S.A. Borshch, J.-J. Girerd & E. MUnck, Inorg. Chem. 36 (1997) 3689. 191. M. Czerwinski, Intern. J. Quant. Chem. 72 (1999) 39. 192. L. Noodleman & E.J. Baerends, J. Am. Chem. Soc. 106 (1984) 2316. 193. L. Noodleman, J.G. Norman, J.H. Osborne, A. Aizman & D.A. Case, J. Am. Chem. Soc. 107 (1985) 3418. 194. J.-M. Mouesca, J.L. Chen, L. Noodleman, D. Bashford & D.A. Case, J. Am. Chem. Soc. 116 (1994) 11898. 195. J.-M. Mouesca, L. Noodleman & D.A. Case, Intern. J. Quant. Chem. Quant. Biol. Symp. 22 (1995) 95. 196. L. Noodleman, C.Y. Peng, D.A. Case & J.-M. Moesca, Coord. Chem. Rev. 144 (1995) 199. 197. L. Noodleman, J. Biol. Inorg. Chem. 1 (1996) 177. 198. L. Noodleman & D.A. Case, Adv. Inorg. Chem. 38 (1996) 423. 199. R.W. Lane, J.A. Ibers, R.B. Rankel, G.C. Papaefthymiou & R.H. Holm, J. Am. Chem. Soc. 99 (1975) 84.i 200. R.G. Shulman, J. Mol. Biol. 124 (78) 305. 201. E.T. Adman, K.D. Waterpaugh & L.H. Jensen, Proc. Natl. Acad. Sci. USA 72 (1975) 4854. 202. Brookhaven protein data bank fries l cad, arb9, and 8rxn. 203. Brookhaven protein data bank fries l caa, ldfx, lrdg, 4rxn, 5rxn, and 6rxn. 204. J.J. Mayerle, S.E. Denmark, B.V.DePamphilis, J.A. Ibers & R.H. Holm, J. Am. Chem. Soc. 97 (1975)1032. 205. Brookhaven protein data bank fries lfrd, lfrr, lfxi, lqt9, and 4fxc. 206. S. Iwata, M. Saynovits, T.A. Link & H. Michel, Structure 4 (1996) 567. 207. C.J. Carrell, H. Zhang, W.A. Cramer & J.L. Smith, Struct. 5 (1997) 1613. 208. M.D. Lowery, J.A. Guckert, M.S. Gebhard & E.I. Solomon, J. Am. Chem. Soc. 115 (1993) 3012. 209. R. Lumry & H. Eyring, J. Phys. Chem. 58 (1954) 110. 210. P. Ghosh, D. Shabat, K. Kumar, S.C. Sinha, F. Grynszpan, J. Li, L. Noodleman & E. Keinan, Nature, 382 (1996) 339.
55 211. T.L. Poulos, J. Biol. Inorg. Chem. 1 (1996) 356. 212. L. Stryer (1995) Biochemistry p. 218. 213. M. Levitt (1974) Peptides, polypeptides and proteins (E.R. Blout, F.A. Bovey, M. Goodman, N. Lotan, eds.), p. 99, Wiley, New York. 214. A. Fersht (1985) Enzyme Structure and Mechanisms, p. 341, W.H. Freeman & Co., New York. 215. J. BostrOm, P.-O. Norrby & T. Liljefors, J. Comp.-Aided Mol. Design. 12 (1998) 383. 216. A.A. Gewirth, S.L. Cohen, H.J. Schugar & E.I. Solomon, Inorg. Chem. 26 (1987) 1133. 217. W.B. Church, J.M. Guss, J.J. Potter & H.C. Freeman, J. Biol. Chem. 261 (1986) 234. 218. D. Barrick Curr. Opin. Biotechn. 6 (1995) 411. 219. H. Nar, A. Messerschmidt, R. Huber, M. van der Kamp & G.W. Canters, FEBS Lett. 306 (1992) 119. 220. T.P.J. Garrett, J.M. Guss, S.J. Rogers & H.C. Freeman, J. Biol. Chem. 259 (1984) 2822. 221. W.E.B. Shepard, R.L. Kingston, B.F. Anderson & E.N. Baker, Acta Cryst. D49 (1993) 331. 222. N. Kitajima, K. Fujisawa, M. Tanaka & Y. Moro-Oka, J. Am. Chem. Soc. 114 (1992) 9232. 223. P.L. Holland, W.B. Tolman, J. Am. Chem. Soc. 121 (1999) 7270. 224. H.W. HeUinga, J. Am. Chem. Soc, 120 (1998) 10055. 225. K. Nilsson, G. Kleywegt & U. Ryde (2000) "Quantum chemical refmemem of crystal structures", manuscript in preparation.
This Page Intentionally Left Blank
L.A. Eriksson (Editor)
Theoretical Biochemistry- Processes and Properties of Biological Systems
57
Theoretical and Computational Chemistry, Vol. 9 9 2001 Elsevier Science B.V. All rights reserved
Chapter 2
Myoglobin D. K a r a n c s i - M e n y h f i r d , a G. Keserii b and G. Nfiray-Szab6 a
~Department of Theoretical Chemistry, Lor/md E6tv6s University, H-1117 Budapest, Pfizmfiny P6ter st. 1A, Hungary bChemical and Biotechnological Research and Development, Gedeon Richter Pharmacochemical Works, H-1475 Budapest, P.O. Box 27, Hungary
1. I N T R O D U C T I O N
Myoglobin, a small molecular weight (16.7 ld)) protein contains 152 amino acid residues organised into a single polypeptide chain and uses a prosthetic heme group to fulfil its catalytic function. Although myoglobin is relatively small its biological function is of utmost importance: this is the oxygen transport protein in muscles. Similarly to the larger and more complicated hemoglobin located in red blood cells, myoglobin is responsible for the reversible binding of molecular oxygen. A large number of natural mutations allowed for studies of the contributions of individual amino acids in the primary structures of proteins to their structure and function, long before genetic engineering made this possible with nearly any protein. This, combined with the vast amount of information obtained from Xray crystallography and other experiments allows to construct several precise models for calculations and makes this protein an ideal target for such studies. Since hemoglobin is a tetramer, thus its structure and function are more complicated than those of myoglobin, it is not surprising that the vast majority of computational studies published to date deals exclusively with myoglobin. In the following we give a survey on these studies which focus mostly on dynamic aspects of structure and function, though the electronic and electrostatic aspects of ligand binding are also treated. ~
58
2. C O N F O R M A T I O N AND STRUCTURAL DYNAMICS It is not surprising that myoglobin was the first protein investigated by X-ray crystallography [1,2] (Figure 1). The polypeptide chain of myoglobin forms 8 helices representing almost the 70 % of the protein. The protoporphyrin IX heme unit is located within a pocket formed by two helices and its aliphatic nonpolar side-chains are oriented toward the compact hydrophobic core. The heme unit is anchored by His93 which is referred to as the proximal histidine, since it is the closest amino acid of the near-lying side of the heme binding pocket. The other, more spatial distal side forms the ligand binding cavity above the iron.
Figure 1. Three-dimensional structure of wild-type metmyoglobin on the basis of the Protein Data Bank file 1YMB by S.V Evans and G.D. Brayer, J. Mol. Biol., 213 (1990) 885.
The most characteristic amino acid residue of the distal side is again the residue closest to the iron, the distal histidine (Figure 2). The globular structure of myoglobin is characteristic for all the globins and excludes the formation of large cavities in the protein interior. Since this is a mainly hydrophobic region very few structural water molecules were located inside the protein. Polar
59
Figure 2. The active-site residues of myoglobin groups are located almost exclusively on the surface of myoglobin and are hydrated. Dramatic development in instrumentation and computational techniques as well as the recently gained in-depth knowledge in structural and molecular biology subjected the internal motions of proteins to considerable interest. Availability of the three-dimensional structure of myoglobin prompted a number of theoretical studies on its structural dynamics including some having significant methodological importance as well (cf. e.g. [3]). Molecular dynamics (MD) simulations, the most suitable theoretical tool for the investigation of internal motions, can be used to explore both equilibrium properties and time-dependent phenomena. Based on both experimental and theoretical observations two models for the internal motion of proteins have been suggested. Within the framework of the firstmodel internal motions arise from harmonic or quasi-harmonic vibrations that occur in a single multidimensional well on the potential energy surface [4,5,6,7]. The second model assumes that motions are a superposition of oscillations within a well and
60
transitions take place among different conformational substates separated by low energy barriers on the multi-minimum surface [8,9]. Based on a 300 ps long MD simulation of myoglobin performed at 300 K Elber and Karplus concluded that its potential energy surface is characterised by a large number of minima close to the crystal structure separated by barriers permeable for thermal motions [3]. This means that the protein undergoes frequent transitions from one minimum to another resulting in fluctuations. Separating the fluctuation of main chain and side chain atoms it was proposed that 20 to 30 % of main chain fluctuations are located within an energy well (in-well fluctuations) [10] while 70 to 80 % arise from transitions between wells. Side chain fluctuations have an even larger contribution from transitions. Comparing the position of non-hydrogen atoms they concluded that minima mainly differed in the orientation of helices. Reorientation of helices were coupled with side-chain rearrangements to maintain close packing within the protein. Extrapolating from fluctuations observed at 300 K in-well fluctuations were expected to decrease linearly with temperature while transitions between wells were proposed to become limited. Analysis of distance matrices between two conformations revealed that large deviations of corresponding carbon atoms are associated primarily with loop displacements or reorganisation of helices relative to each other. Time-resolved analysis of carbon positions, however, indicated structural rearrangements to be localised to some of the loop regions. This observation suggested that loop motions are the elementary step for a transition process and are coupled with associated helix displacements. Elber and Karplus proposed loop rearrangements to be initiated by side-chain reorganisations occurred in helices or dihedral angle transformations within the loops [3]. High temperature factors obtained for side chains by high resolution X-ray crystallography are in accordance with the possible involvement of sidechain motions in helix packing [11,12]. Demonstration of the relevance of multiple conformational states for proteins underlined the importance of structural imhomogeneities in protein function. Since the heine pocket of myoglobin is located within the protein interior there is no evident path for the ligand to bind and escape form the binding site. Thus, fluctuations must be involved in the entrance and exit of ligands. These simulations revealed that main-chain structures are relatively rigid within secondary structure elements (solid-like behaviour) but loops and side-chain clusters at inter-helix contacts are flexible enough to reorganise when helices move from one minimum to another (liquid-like behaviour) [ 13]. Relative importance of rigid-helix and side-chain motions were studied by Corbin et al. [14]. The contribution of two types of rigid-body motions were approximated. In the first set of calculations helices of deoxymyoglobin were treated as rigid units while in the second set side-chains were considered to be
61
rigid. Trajectories of the rigid-body motion were derived by fitting the rigid reference structures to each time frame. Fitted trajectories were analysed in terms of atomic position fluctuations and rms. displacements as a function of time. These features of the fitted trajectory were compared to that of the full trajectory. On the basis of this analysis they found that the relative contribution of helix and side-chain motions depend on the feature compared. Rigid-helix motions were found to contribute 86 % of the backbone atomic position fluctuations but only 30 % of the rms. displacements. This indicated that only low frequency motions could contribute to the rigid-helix dynamics. In contrast, treating the side-chains as rigid bodies was found to be a good approximation since 96 % of side-chain atom displacements originated from rigid side-chain motions. Since low energy conformations of myoglobin are close in energy and separated by low energy barriers Elber and Karplus [3] concluded that this protein should be glasslike i.e. the protein might be trapped in metastable conformational states at low temperatures as was shown by Stein [15]. Investigating internal motions in carbon monoxy myoglobin at 325 K and 80 K by MD simulations Kuczera et al. gave further insight into this temperature effect [16]. Comparison of myoglobin crystal structures measured at 300 K and 80 K revealed temperature dependent inhomogeneous thermal contraction associated with a shift of the CD region, the E helix and the EF loop [17,18,19,20]. Contraction observed at low temperatures can be explained by anharmonic effects. An alternative interpretation of this thermal effect is based on the multiple minima concept of the potential energy surface introduced by Elber and Karplus [3]. In this latter theory the observed structure is the average of a series of minima having different weights that vary with the temperature. Contraction in this case can be interpreted as a temperature dependent change of the average structure. Changes in fluctuations were monitored at different temperatures also by molecular dynamics simulations. Particular attention has been paid to the experimentally observed transition in protein dynamics bellow 180 K. Since in vacuo neutron scattering data revealed the presence of such a transition at about 200 K, simulations were performed in vacuo instead of the use of an explicit solvent model [21]. Possible involvement of solvent effects have been pointed out by Parak et al. suggesting the that the transition may be due to the change in the properties of the solvent [22]. Comparing the experimental and theoretical radii of gyration the experimentally observed inhomogeneous contraction was fairly reproduced. The most flexible region of the protein was also in accordance with experimental data. Overall fluctuations obtained from the X-ray data at 260 K and from the simulation at 325 K were found to be similar but a significant difference between their temperature dependence was identified. Based on X-ray
62
data fluctuations were reduced by a factor of 1.2 on going from 260 K to 80 K while they were decreased by a factor of almost 4 using simulation data. On the basis of the single well model this ratio was calculated to be 2 which is significantly smaller than that of the simulation and larger than that obtained by X-ray diffraction analysis. In addition to the fact that the rms. ratio calculated from X-ray data reflects large static disorders involved in crystallographic Bfactors this clearly demonstrates the reliability of the multiple minima model suggested by Elber and Karplus [3]. Low and high temperature fluctuations were compared by analysing the corresponding dihedral angles. This comparison suggested that large backbone fluctuations are usually associated with large side-chain motions but the reverse is not always true. Largest fluctuations were detected close to the protein surface while dihedral transitions are limited to side-chains. There were six residues having bimodal distribution for backbone dihedrals at 325 K but no transitions for backbone dihedral angles were detected either at 80 K. In addition to the difference found between atomic fluctuations calculated at 325 K and 80 K this is also an indication that the low temperature trajectory sampled only a single minimum of the potential energy surface. In addition to monitoring atomic fluctuations along dynamic trajectories Henry suggested the use of heine orientation as a probe of dynamic processes in myoglobin [23]. Analysing heine reorientation motions he found that variations of heme orientation correlate with the structural dynamics of the protein on a time scale of hundreds of picoseconds which enables the use of heme displacement as a measure of flexibility within the protein. A potential application for this probe is the MD investigation of proximal mutants of deoxymyoglobin [24]. Although it was not realised by Nowak, the intensity of reorientation motions of the heine clearly depends on the nature of the proximal amino acid which should have an effect on the internal dynamics of individual proximal mutants. In fact the author found some mutants to be more flexible than the native protein. The picosecond internal dynamics of myoglobin was explored by measuring inelastic neutron scattering by Smith et al. [25]. At low temperatures they found the dynamics to be harmonic while at higher temperatures a considerable quasielastic scattering was detected. Agreement between the experimentally observed spectra and that calculated from molecular dynamics simulations also showed evidence for restriction of the conformational space sampled at 80 K relative to 300 K. On the basis of these results it was concluded that the protein is trapped in local minima at low temperatures in accord with the multiple substate model suggested by low temperature flash photolysis experiments and previous molecular dynamics simulations. Comparison of atomic fluctuation data sets collected at both 325 K and 80 K confirms that the room temperature
63
flexibility of myoglobin is associated with motions at the loop regions and corresponding reorientation of side-chains. Side-chain dynamics were also studied by neutron scattering experiments [26]. At temperatures above 200 K the motion of atoms in myoglobin contains a nonvibrational component responsible for the characteristic elastic and quasi-elastic profiles present in the neutron scattering spectra. Since non-vibrational dynamics is expected to be required for protein function, neutron scattering experiments represent a valuable tool to explore structural dynamics at the atomic level. Performing a set of different MD calculations Kneller and Smith reproduced the neutron scattering results of myoglobin by a simulation treating protein side-chains as rigid bodies [26]. On the basis of the excellent agreement obtained they concluded that neutron scattering profiles resulted from a liquidlike rigid-body motion of the protein side-chains. This is an agreement with the theoretical calculation of Corbin et al. [14] predicting the displacement of sidechain atoms to be originated in rigid-body side-chain motions. Although early MD investigations of myoglobin were performed in vacuo there are a number of experimental evidences including X-ray, NMR and neutron scattering data suggesting that solvation plays an important role in the dynamics and function of the protein. One of the first investigations on the hydration of myoglobin was performed by Dainziger and Dean testing their site-mapping algorithm [27]. The predictive power of the method was ascertained by the agreement observed in location of the 384 water molecules as compared to crystallographic results. Schmidt et al. used a Monte Carlo approach to explain the distribution of water co-ordinates in the crystallographically invisible part of the unit cell of myoglobin [28]. Monte Carlo calculations were started from different initial water structures and differences between final distributions were used to calculate the rms. displacements of water molecules. Mean-square displacement of 0.58 A were found to be in a fair agreement with that obtained from experimental data and demonstrated the effectiveness of Monte Carlo techniques in protein hydration studies. A more detailed study on hydration dynamics of myoglobin was published by Gu and Schoenborn [29]. The authors evaluated the stability of the 89 bound water molecules observed by neutron diffraction experiments on carbonmonoxy myoglobin. Starting from the experimental structure of myoglobin the protein was solvated, minimised and equilibrated. A 50 ps long Newtonian dynamics simulation of this system revealed that only four water molecules were continuously bound whereas the other crystallographic waters showed significant mobility. Most of the water molecules broke and reformed hydrogen bonds with the protein. Analysing the corresponding trajectories, the authors concluded that 73 of the hydration sites observed in the neutron structure of carbon-monoxy myoglobin are occupied by water. This simulation explained
64
differences in hydration as observed in NMR, neutron and X-ray diffraction experiments. A comparison between the solution phase structure of myoglobin and that measured by single crystal diffraction pointed out that water molecules have an important role in crystal packing. The authors concluded that hydration of myoglobin has an indispensable effect in the dynamics and stability of the protein [30]. The protein-solvent interface was studied in an explicit solvent environment of 3182 water molecules by MD simulations performed on metmyoglobin [31 ].Both the structure and dynamics of the hydrated surface of myoglobin are similar to that obtained by experimental methods calculating three-dimensional density distributions, temperature factors and occupancy weights of the solvent molecules. On the basis of trajectories they identified multiple solvation layers around the protein surface including more than 500 hydration sites. Properties of theoretically calculated hydration clusters were compared to that obtained from neutron and X-ray data. This study indicates that the simulation unified the hydration picture provided by X-ray and neutron diffraction experiments. The functional role of hydration might be explained by solvation effects found to be crucial for the interpretation of anharmonic fluctuations and also the folding of myoglobin. Experimental studies suggested that, in contrast to harmonic fluctuations observed at low temperatures, at physiological temperatures the equilibrium dynamics of the hydrated myoglobin is highly anharmonic [32]. This temperature dependence is in accordance with the existence of multiple conformational substates calculated by Elber and Karplus for the non-hydrated myoglobin [3]. Since M6ssbauer studies revealed that hydrated myoglobin exhibits fluctuations more anharmonic than those of the dehydrated protein one can conclude that hydration might enhance the exploration of conformational substates [33]. Another important effect of hydration is proposed to be its influence on dihedral transitions. Anharmonic fluctuations of the non-hydrated protein were explained by torsional motions that undergo transitions from one minimum to another and by atomic meansquare fluctuations [ 16,34]. To characterise the functionally important motions in hydrated myoglobin, simulations on its hydrated CO complex have been performed by Steinbach and Brooks [35]. In this study the temperature and hydration dependence of equilibrium dynamics was investigated. The authors performed two sets of MD simulations, torsionally restrained and unrestrained calculations on dehydrated carbonmonoxy myoglobin at different temperatures between 100 K and 400 K were compared to that on the hydrated protein. They found that the dehydrated protein exhibits almost exclusively harmonic fluctuations at all temperatures, while remarkable anharmonic motions have been detected in the hydrated protein at about 200 K independently whether the torsions were constrained. The
65
authors concluded that anharmonic fluctuations observed in hydrated myoglobin are not primarily due to dihedral transitions. In the absence of these transitions local dihedral fluctuations and anharmonic motion of whole helices were observed. In addition to the fact that hydration facilitates anharmonic motions the most important consequence of hydration is the reduction of barriers between conformational substates. The addition of water changes the potential energy surface and has an influence on protein motions via the formation of van der Waals and electrostatic interactions. Steinbach and Brooks demonstrated that electrostatic interactions formed upon hydration might be responsible for the increased mobility of the solvated protein [36]. Applying an adaptation of the principal component analysis using the so-called singular value decomposition (SVD) approach Andrews et al. recently presented support for this hypothesis [37]. The SVD analysis was also used to characterise the dynamics of the protein. Since the proposed hierarchical structures of conformational substates are easily quantified by this technique the authors analysed the total conformational space of hydrated myoglobin. This analysis showed that the protein hops between a number of distinct global conformational states. Considering the solvated nature of native proteins these results suggest that structural dynamics of proteins and particularly that of myoglobin cannot be interpreted using a pure harmonic approach and also that calculations on protein dynamics should consider solvation effects as well. Careful consideration of solvation effects in folding calculations were also shown to be of primary importance. Characterisation of the heme-free apo myoglobin was performed in aqueous solution by Brooks [38]. Analysis of the structure and motion of apo myoglobin and a proposed folding intermediate, the less compact acid stabilised I state, revealed a subdomain defined by helices A, G and H to be relatively rigid. This finding suggested that the folding of apo myoglobin involves an early intermediate in which these helices form a compact and pre-organised subdomain in a native-like conformation. Based on MD simulations performed on isolated helices of solvated apo myoglobin Hirst and Brooks investigated the relationship between the intrinsic stability of helices and the structure and folding pathway of the protein [39]. Relative stabilities of helices were explored by the analysis of hydrogen bonding and fluctuation at 298 K and 368 K. Calculated relative stabilities, A > G > H > B > E > F are in accordance with experimental equilibration and kinetic data. Combining this result with the experimentally observed fact that a subdomain containing the most stable A, G and H helices is an early folding intermediate the authors suggested this subdomain to be crucial on the folding pathway.
66
3. C O M P L E X E S W I T H VARIOUS LIGANDS
How globin proteins execute their physiological role of 0 2 transport and storage in the presence of significant CO concentrations has long been in the focus of research. O2 binds to free porphyrin in a bent conformation which is stabilised by an H-bond to the distal histidine in the protein. [40]. CO binds to free porphyrin perpendicular to the heine plane up to 25,000 times more strongly than O2 [41], however this ratio is reduced to only 30 in myogolobin [42]. The crystal structures of wild-type myoglobin-CO complexes show a range of quite different Fe-CO conformations [43,44,45,46], but in all cases the Fe-CO unit has been shown to be bent and tilted away from the heme normal. On the other hand, infrared polarisation spectroscopic measurements indicate a nearly upright geometry for CO in myoglobin [47,48]. The very large change in relative affinities of the ligands for the heine when placed in the protein environment has at first been explained by the steric repulsion between the CO ligand and the distal histidine [49,50]. However, how such a large strain could be delivered by a solvent exposed, mobile side chain was hard to picture. Later infrared measurements showed that the interaction between CO and the distal histidine is electrostatic and stabilising in nature [51 ], similarly to the case of O2, but not so pronounced. Anfinrud and co-workers proposed a completely new solution to the problem of ligand differentiation. Relying on infrared spectroscopic results they suggested that the difference in binding rate between CO and O2 is due to the unfavourable (parallel) orientation of CO in its trapped state hindering its fast rebinding [47,52]. Therefore, it is still in debate exactly how such definite differentiation of the two ligands is accomplished by the protein. The ligand bound state of Mb-CO has been spectroscopically characterised by infrared spectroscopic measurements of the CO stretch, showing three isolated distinct lines known as the A states [53]. It has been suggested that the origin of these multiple peaks lies in the structural and electrostatic variability of the distal pocket [54]. Support for this notion has been gained from recent crystal structures of Mb-CO that show discrete disorder of the distal histidine [55,56,57] revealing the presence of several different conformers of His64 even at room temperature [58]. In light of the above, the exact protonation motif, tautomeric state and solution state orientation of His64 became a critical question addressed by several experimental methods but has not been unequivocally clarified to date [44,59]. Another diatomic ligand of the globins has been lately given considerable attention, too. NO is a redox active, high polarity ligand that can bind to the heme of globins in all naturally occurring oxidation states, and is even responsible for the interconversion of these states [60,61,62,63]. NO has been recognised as the endothelial-derived relaxing factor [64,65], a key intracellular
67
signal and defensive cytotoxin in the nervous, muscular, cardiovascular and immune systems. The dynamics of the metMb-NO complex has been shown to be quite similar to its isoelectronic pair, Mb-CO [66,67]. The crystal structure of Mb[Fe(II)]-NO has also been solved [68,69]. The difference in relative binding affinity of 02 and CO toward Mb is shown to be dominantly due to electrostatic interactions in the free energy perturbation study of Lopez and Kollman [70]. The MbO2 complex was mutated to MbCO in two steps involving the mutation of the ligand followed by the mutation of the distal histidine to its corresponding tautomer. The simulation reproduced the crystallographic results obtained for the Fe-C-O unit which was found to be bent (bend angle: 159.6 ~ and tilted (tilt angle: 6.2~ Contribution of all residues within a 9 A radius of the iron was considered. The proximal histidine and the heme group were found as having the greatest difference in free energy upon mutation followed by the distal histidine and four other nearby residues. The electrostatic nature of the interaction between the distal histidine and CO has been further probed by MD simulations [71]. The goal was to resolve puzzling neutron scattering results [44,72] showing the distal histidine protonated at N~ with the proton pointing out into the solvent while the negative electronic potential of the nitrogen lone pair reaching close to the carbonyl ligand, quite opposite to that expected. Four 90 ps MD simulations were carried out on the myoglobin-CO complex, two with the distal histidine protonated at N~, and two with N~ protonation. The protein was solvated by TIP3P explicit water molecules, all waters within 16 A of the CO in the minimised structure were included in the dynamics simulation [73]. Amino acid residues outside the 16 A radius of CO were restrained slightly to their solvated positions, as well as distant water molecules to prevent boiling off of the solvent. Results show that solution phase orientation of the distal histidine differs from that measured in the crystal, both His64 tautomers had the side chain oriented with the polar H inside the pocket. The N~H tautomer is in a stable electrostatic interaction with the oxygen of CO, while N~H interacts weakly with the ligand. The calculations suggest that the three broad CO stretching peaks in the infrared spectrum of MbCO (A states) are due to the different protein environments arising from the differing protonation motifs of the distal histidine instead of representing different Fe-CO binding arrangements. The A3 state was assigned to the 64N~H tautomer, while states A1 and A2 (not resolved at room temperature) to the 64N~H tautomer. The A0 state was assigned to one where the distal histidine flips out of the heme pocket. The same model and computational protocol was used in another MD study of the distal side effects [74]. Three distal side mutants were studied. The His64Gln mutation resulted in a conformation similar to that of the wild-type 64N~H tautomer, the His64Leu mutant showed little interaction with the ligand, while
68
the mutant His64Gly allowed the free movement of water molecules in and out of the active site. All MD simulations have a minimum for the perpendicular orientation of CO, demonstrating that there is little steric influence over the ligand conformation. On the other hand, the wild type and the His64Gly mutant (where the steric repulsion is minimised) crystal structures show the carbon monioxide ligand in a bent conformation. The authors propose that failure of MD to find significantly distorted Fe-CO arrangements may thus be the result of force fields not allowing the coupling of the proximal residue via the metal dorbitals with the ligand. This assumption was based on previous quantum mechanical studies [75,76] suggesting that the distorted binding (destabilisation) of CO in the protein is largely determined by the non-equilibrium orientation of the proximal histidine that anchors the heme group into the active site pocket. The applied model was constructed on the basis of the crystal structure of Mb-CO [43] and contained two amidinate moieties representing the heme group, the distal and proximal histidines as imidazoles and the CO ligand (Figure 3A). Both imidazoles were
N,x,~N
_H_ C O H /
H~"---N~ [ / N --='-.~
0
Ul
H
N.
H
N---
H
CO It
0 I C
H
N
H Co
H 1
H~.~N/~H H
D.
H
Figure 3. The models applied in quantum mechanical calculations of the Mb-CO complex.
69
slightly rotated to maintain the overall Cs symmetry of the model. The equatorial ligands were described by minimal STO-3G, the axial ligands by Dunning's (9s/5p), the imidazoles by the 3-21G basis sets. Applying a pseudopotential at the Fe atom, the authors optimised the bend and tilt angles and the Fe...CO bond length both in the presence and the absence of the distal histidine at the MP2 level. In addition, in the free heme model the orientation of the proximal histidine was also allowed to relax. The effect of the conformational changes of the proximal residue was also tested in a systematic manner. Upon twisting the proximal histidine outside its equilibrium range of bending angles a large change in CO orientation of up to 30 ~ was seen, suggesting that this angle is the key determinant of the Fe-CO geometry. A perpendicular CO conformation resulted when the proximal side chain was allowed to relax. The authors also found that in the protein model the Fe-CO unit remains significantly distorted even in the absence of the distal histidine and this is caused by the distorted orientation of the proximal His residue. Based on this results it was concluded that the Fe-CO distortion is largely determined by the proximal residue but it is fine-tuned by electrostatic interactions with the distal side. Relying on a similar hypothesis, using an extended model of the protein environment, INDO/S configuration interaction and Hartree-Fock calculations were employed to investigate the effect of the distortion of the proximal residue on the back-bonding and electrostatic relations of the metMb-NO complex [77]. The active site model contained the iron, the entire heme group and all residues in close contact: Phe43, His64 (distal histidine), Va168, Leu89, His93 (proximal histidine) and His97 (Figure 4). An internal angle, the C~-C~-Cx-C~2 torsion of the proximal histidine was changed from-113 ~ in the wild-type protein t o 125 ~ found in various X-ray structures of penta-coordinated myoglobin distal side mutants [78], by 2 ~ increments. Correlation was established between the binding character of the iron-heme active centre and the torsion angle of the proximal histidine. T h e - 1 2 5 ~ value resulted in such a rearrangement of the charge distribution and weakened back bonding to the ligand that might initiate its release from the iron. The calculations also confirmed previous experimental findings that the heme group serves as an antenna communicating the conformational variations of the protein to the ligand [79].This communication between the proximal histidine and the ligand could be completely switched off by omitting protoporhyrin IX from the model. Point charges and the electric field located both at the distal and proximal sites strongly affect the Fe-C and C-O bond indices as well as the C-O vibration frequency. This was revealed in a detailed study by Kushkuley and Stavrov [80,81] using INDO calculations to follow the effect of the distortion of the porphyrin ring, the iron...imidazole distances, the iron displacement out of the
70
porphyrin plane and electrostatics on the Vco frequency, the 170 isotropic chemical shift and the nuclear quadrupole coupling constant (Figure 3D). Only
Figure 4. Active-site model of the metMb-NO complex.
the charged groups were found to have a characteristic influence which could be modulated by heme distortions. It was shown that the main contribution to the CO activation stems from the change in cy donation from CO to the iron and the back-bonding from the iron to the 2~ orbital of CO. These relations, however, could only be significantly perturbed by the introduction of point charges in either equatorial, distal or proximal directions. Still studying the electrostatic effect of the distal side, the possible different tautomeric states acquired by the distal histidine were recently revisited by Phillips and co-workers [82]. The electrostatic properties of 20 different myoglobin CO and 02 complexes were studied using the linearised PoissonBoltzmann method [83]. The co-ordinates for the calculation were taken from crystal structures of the complexes in all but two cases. For each structure two different protonation schemes were built, one with N~-protonated and one with N~-protonated His64. The electrostatic potential at both ligand atoms was calculated in every case. The charges of the protein amino acids were assigned according to the charmm22 parameter file, charges of the heme group were not included arguing that the contribution of the heme to the field is not affected by the distal pocket alterations. In the case of the N~ tautomer correlation could be established between the calculated electrostatic potential at the O atom of CO and the meas/lred vco frequency, the potential difference at the C and O atoms
71
also correlated fairly with this value. However, the 8 protonated tautomer provided very negative potentials both at the C and O atoms, no correlation described above could be established. Based on this finding the authors concluded that the e-protonated His64 is favoured in Mb complexes, in contrast to that found by neutron scattering [44]. A linear relationship was found between the logarithm of the dissociation rate constant of the 02, CO and NO complexes of Mb and the electrostatic potential at the terminal O atom of the ligand in various distal side mutants and the wildtype structure, ko2 decreases by 4 orders of magnitude in the studied potential range with positive and negative fields enhancing and decreasing the 02 affinity, respectively. Therefore the stretching frequency of the bound CO could be used to describe the polarity of the distal pocket and to predict the extent of electrostatic stabilisation of bound 02. The dissociation constants of the more apolar Fe-CO and Fe-NO complexes showed almost no dependence on the surrounding electrostatic field. Utilising the linear relationship between the electrostatic potential at the O of CO and Vco, the authors determined the value of the ;~ torsion angle of His64 that would create potentials corresponding to the measured A states of the complex. Thus, the A3, A1 and A0 states could be explained by values of AZ~ of-15 ~ 0 ~ and >60 ~ respectively. Gosh and Bocian calculated the carbonyl tilting and bending potential energy surface of Mb-CO [84]. They constructed three different models, (Porphine)Fe(Im)(CO), and two smaller complexes, all three with a C~ symmetry (cf. Figures 3B, 3C and 3D). In this case geometry optimisation did not reveal any connection between the binding geometry of the CO ligand and the orientation of the proximal histidine, it provided a practically linear binding geometry for CO. To obtain the potential energy surface of the models, single point calculations by the density functional theory were carried out using the local exchange-correlation functional of Hedin and von Barth [85] at specified bend and tilt angles of CO within the range o f - 1 5 ~ to 15~ The surface was constructed by fitting 15 single point energies with a general quadratic polynomial. Results show that the in-phase displacement of the bend and tilt angle represents a low energy pathway across the potential surface as was formerly suggested by Li and Spiro [86]. These authors found an unexpectedly large negative tilt-bend interaction constant, which allows to bend + tilt deformations by as much as 25 ~ at a low energy cost of ~8 kJ/mol. Thus, distal pocket interactions that destabilise the CO ligand by only 8 kJ/mol could explain its experimentally detected distortion. The negative interaction constant determined for the tilt and bend angles lead to the reinterpretation of the vibrational features of the Fe-CO unit as well. On the other hand, results of Oldfield and co-workers give no support for distorted (non-linear) binding of CO to Mb either in solution or in the solid
72
phase [87]. 13C NMR chemical shifts and their anisotropies, 170 NMR chemical shifts and quadrupole couplings, 57Fe NMR chemical shifts and M6ssbauer quadrupole splitting values were measured in metalloproteins and model systems, and the results were compared to calculations carried out with various density functional methods on small models. The effect of ligand tilt and bend on 13C and 170 shifts and shift anisotropies were calculated with the sum-overstates density functional perturbation theory [88] using individual gauges for localised orbitals [89] on the (amidinato)(CO)(NMelm) system. 57Fe NMR chemical shifts and electric field gradients at ~70 and 57Fe were calculated as a function of the ligand tilt and bend on a larger model, including a porphyrin ring without its substituents, the axial imidazole and the CO ligand, with optimised Fe-C and C-O bond lengths at fixed ligand tilt and bend angles using the B3LYP functional with Wachters' Fe basis set [90] and a mixture of 6-31 G* and 3-21G bases for the rest of the atoms. The authors found that experimental NMR and M6ssbauer results on linear Fe-CO metalloprophyrins are essentially the same as those found in the A0 substate of heine CO complexes and the results of the density functional studies [91] ruled out bent and tilted arrangements for CO. Similar results were obtained for the A~ state, too, where MOssbauer splittings and the 57Fe shift could only be reproduced using a linear model of the Fe-CO unit. A quite different conclusion has been drawn by Spiro and Kozlowski [92] who carried out gradient corrected density functional calculations in an attempt to reconcile conflicting experimental results of crystallography and infrared dichroism measurements. Based on infrared spectroscopic results Anfinrud and coworkers [47] concluded that the C-O direction must lie within 7 ~ of the heme normal, but the resulting O atom displacement is far from that determined by Xray crystallography. An imidazole-heme-CO model (cf. Figure 3D) was described by the B3LYP correlation functional. The analytical force constants (koo, k~, k~o) thus deduced follow the same trend as those obtained by Gosh and Bocian, however, k~ and k~0 are ~40% lower in exact value. Authors question the commonplace working hypothesis of infrared spectrum evaluation that the measured transition dipole coincides with the direction of the C-O bond vector. Calculating the transition dipole direction along the minimum energy distortion path for CO resulted a direction much closer to the Fe-C than to the C-O bond vector. This way the 6.9 ~ transition dipole measured by infrared spectroscopy corresponds to a minimum energy structure with 9.5 ~ tilt angle and 5.8 ~ bend angle. This falls within the range determined by X-ray crystallography, resulting in only ~4.0 kJ/mol distortion energy. Therefore the conflict of infrared spectroscopic and X-ray diffraction results disappear, both methods measuring tilted and bent equilibrium binding geometry for CO in Mb. Based on this, about 25% of the discrimination of myoglobin in favour of 02 can be attributed to .,
73
steric hindrance of CO and the remaining ~13 kJ/mol must come from the pronouncedly more favourable electrostatic interaction between the distal histidine and 02.
4. (PHOTO)DISSOCIATION Photolysis, photodissociation and rebinding of carbon monoxy myoglobin has thoroughly been investigated not only to understand this specific process, but also because this system is an ideal model case for the study of general protein relaxation time scales and mechanisms. A vast amount of experimental evidence is available on the photodissociation and rebinding kinetics of carbon monoxy myoglobin [93], the pH, solvent and temperature dependence of the process has been described [94]. The dissociation of Mb-CO is thought to proceed through a four step pathway [95]: MbCO (bound, A states) <=> Mb:CO(distal pocket, B states) <=> Mb::CO(protein matrix) <=> Mb:CO(solvent) or as according to the side path model [96]: MbCO (bound, A states) <=> Mb:CO(distal pocket, B states) <=> Mb:CO(solvent) c Mb:CO(protein matrix, secondary site)
The infrared stretching frequency of CO has been measured both in the hemebound state and after photodissociation [53, 97]. In the ligated state three distinct ligand stretching frequencies can be isolated, these are the A states. After dissociation at low-temperature again three distinct values of the CO stretching frequency have been identified, these states are known as the B states. The B states evolve within the first 300-500 fs following photolysis and the spectrum remains practically unchanged up to 1 ns [98]. Within this time frame 85% of the ligand has been shown to be still in the heme pocket, therefore it is sound to suspect that the conformational changes and the charge distribution of the protein matrix induces the observed frequency shifts in both the A and B states. The early events that follow photodissociation have been also studied by timeresolved femtosecond electronic spectroscopy and it has been established that a state analogous to deoxy myoglobin is formed within 350 fs, suggesting that the iron had been displaced from the porphyrin plane within this time frame [99]. In a molecular dynamics study Henry et al. [100] showed that the protein has no
74
effect on the fast (estimated half time between 50 and 150 fs) out-of-plane movement of iron following the dissociation of the complex. Low temperature X-ray crystallography was also applied to study the photodissociation of Mb-CO. Two different positions for the dissociated CO molecule were determined by Schlichting et al. [ 101 ] and by Teng et al. [ 102] while Hartmann et al. [103] observed both positions (Figure 5). The two experimentally determined positions indicate that the CO molecule may flip within its geminate binding site. Femtosecond time-resolved infrared spectroscopic measurements showed CO bound within the geminate cavity in two opposing orientations also, the energy difference of the two states was estimated to be about 1.2 kJ/mol [104]. This correlates with the previous finding of Alben et al., who observed that 13C160rebinds more slowly at 20K than 13C180 suggesting that a rotational motion around an axis perpendicular to the CO vector was involved in the rebinding process [ 105]. The hydrophobic pocket within which the CO was found in all cases is bordered by Leu29, Va168, and Ile 107 (Figure 6).
Figure 5. Superimposition of the crystal structures of the photolysed CO complexes of myoglobin. For clarity the heme group and the nearby residues of Leu29, His64, Va168, His93 and Ile 107 are shown only once.
Results of molecular dynamics simulations of the short time behaviour of CO after the dissociation effect are presented by Straub and Karplus [ 106]
75
Figure 6, The geminate trap of small molecular ligands bordered by Leu29, Va168 and Ile 107.
considering the distal histidine both charged and in its neutral form, protonated at N~. The initial model was built based on the crystal structure of the Mb-CO complex [43]. Atoms within 12 A of the central region were completely free to move, those atoms that were outside the 12 A shell but inside 16 A were restrained according to their average X-ray temperature factors, while atoms outside 16 A were held fixed. The complex was solvated by 153 TIP3P [73] explicit water molecules. The CO molecule was described by a three-site model by adding a Lennard-Jones interaction site and a point charge to the centre-ofmass of the molecule. The photodissociation event was simulated by changing the parameters from carboxymyoglobin to deoxymyoglobin and switching off the bond length and angle potentials between the ligand and the heme. Following equilibration, ten 10 ps dissociative trajectories were calculated for both protonation states of the distal histidine. The simulations showed the formation of one dominant dissociated species for the unprotonated and one for the protonated state of the distal histidine within the time frame studied. It was concluded from the trajectories that the excess initial centre-of-mass energy is transferred to active site residues by the first few collisions after dissociation of the CO ligand. The centre-of-mass kinetic energy relaxed within 300 fs, rotational energy within 100 fs for uncharged His64, and 300-400 fs for charged His64. In the neutral His64 trajectories the ligand was found to be situated perpendicular to the heme plane forming a dative bond with the His64 N~ lone pair, while in case of protonated distal histidine hydrogen bond formation was
76
detected within the first ps between the carbon atom of CO and N~H of the histidine side chain. The non-exponential relaxation of the Mb-CO complex after photolysis was attributed to the presence of inhomogeneous time-dependent populations of the protein molecules in the MD study of Petrich et al. [67]. Four independent, dissociative trajectories were started from the co-ordinates of an equilibrium Mb-CO trajectory. Photolysis was modelled by switching heine parameters to those of a penta-coordinate domed heme group, all heme-CO interactions were turned off except for the introduction of a (12 repulsion term for description of the Fe...C interaction to mimic the motion of the CO molecule on the repulsive hypersurface of the dissociative excited state. Points where the excited state and ground state surfaces cross (monitored by the difference between the energy calculated with the modified and standard potential set) were picked to start 100 ps simulations of the photodissociated ground state system. The excited state surface crossed over to the ground state surface within 150 ps in all four cases. An ultrafast process was identified in all four trajectories corresponding to the iron out-of-plane displacement coupled to local adjustments of the heine with relaxation times of 30 fs to 70 fs. The slower process of structural and energetic relaxation was followed as a function of z(t), the distance of the iron from the plane of heine heavy atoms, z(t) showed very different behaviour in the four trajectories suggesting the presence of an inhomogeneous component in the relaxation process. The final out-of-plane displacement of iron in the four trajectories resulted in values around 0.6 A. Results were equally well fitted by both a four parameter power law and a five parameter two-exponential fit using the starting value for z(t) at t = 50 fs of-0.48 A. The evolution of (z(t)) is in a qualitative agreement with experimental results for the Mb-NO complex, as well as with the measured frequency shift of band III over the range of the simulations. Considerable heterogeneity was found in the CO distribution in the heine pocket following dissociation in the calculations of Karplus and co-workers [107]. 28 dissociation trajectories of 10 ps each were started from different points of a solvated, equilibrated Mb-CO trajectory. CO was treated according to the previously described three-site model [106]. The dissociation was simulated by the switch of the heme parameters to those for the pentacoordinated heine and omitting bonding and bending interactions between the heine and the ligand. The averaged positions of the CO molecule reproduced the crystallographically obtained ones, the authors found the ligand to be restrained within a parallel, thin slab at a distance of 3.2+0.3 A from the heme plane, while the crystallographically determined values also range between 2.8 and 3.1 A. Within its parallel plane the CO molecule was seen to have more freedom, but it stayed mainly over the NC pyrrole heine ring. After dissociation, due to
77
favourable interactions with free CO, the distal histidine moves slightly into the pocket. The calculations also confirm the experimental finding that the CO molecule may rotate so that its oxygen comes closer to the iron. This orientation, however, greatly slows down the recombination of the complex, since for rebinding the ligand would have to flip. Room temperature analysis of the equilibrium B-states, that evolve quite a bit later following photolysis, has been carried out by Straub and co-workers [ 108]. Starting points for the four 10 ps MD simulations were picked from a 1.5 ns trajectory of fully solvated, pH neutral deoxymyoglobin generated at room temperature. Deoxymyoglobin has been chosen as the initial model because according to infrared spectroscopic measurements the B1 state continues to move non-exponentially in frequency until about 10 ns after photodissociation. It is only following this 10 ns time period that representative structures of the equilibrium B-state can be achieved, and it is also known that only 350 fs after the dissociation event a deoxymyoglobin-like structure evolves [ 109]. Two starting conformations were chosen where the distal histidine points outward (a-trajectories), and two ones where it points inside the pocket ([3trajectories). The system was solvated by approx. 3000 TIP3P water molecules. CO was described by the three-site model of Straub and Karplus [106] and in addition the first-order dipole of CO induced by the electric field of the protein and the solvent was also included. The results showed the CO molecule in the B (equilibrium photodissociated) state to be close to the binding site, with a definite bias toward the C-ring. The ligand was found loosely confined to a layer parallel to the heine plane, both in ct and [3 trajectories, 4 to 6 A from the heme plane. A quite broad distribution was found indicating only moderately hindered motion. Contribution to the electric field of each residue experienced by the ligand was calculated for all four trajectories. In all structures studied half of the electric field was contributed by the proximal His and the heme group. Calculations indicate that for the description of the total electric field at the ligand, the applied model should include His93 (proximal histidine), Gly65, Leu61, His64 (distal histidine), the heme and the solvent. A comprehensive test of computational protocols applied for the short time dynamics of the photolysed Mb-CO complex is presented by Meller and Elber [110]. 270 different 10 ps molecular dynamics simulations were carried out using two different solvation boxes, two differenc types of electrostatic cutoffs and two different treatments of the photodissociated ligand. In addition, both the wild-type and the Leu29Phe mutant were treated. 9 different setups were combined from the variables described and 30 trajectories were generated for all. Results presented are averages over these 30 trajectories. Calculations were performed using a combination of the AMBER [111] and OPLS [112] force fields, the heme model of Kuczera et al. [16] and approximately 2700 TIP3P
78
water molecules for solvation. It was found that the CO orientational angle with respect to the heme normal relaxes to its equilibrium value of ~ 100 ~ within 1 ps. The spread of the population is large, fluctuations of up to 60 ~ can be found. The effect of mutation on this process is quite small, in fact it is similar to the difference caused by employing a different solvation box for the calculation. Two different orientations of the CO molecule in the plane parallel to the heme were found which are suggested by the authors as the B states. The B 1 and B2 states differ only in their respective head-to-tail orientation. Employing the smaller solvent box only one of these conformers could be identified. The average and most probable positions of CO are almost the same for the mutant and the wild type protein. Spatial distribution of CO within the heme pocket is wide and is almost independent of the electric truncation scheme, however it depends somewhat on the applied model of CO. The authors propose that ignoring the quadrupole moment of CO leads to more rapid diffusion of the small ligand inside the protein. In the calculated infrared spectra the width of the spectral lines are similar to the experimentally derived ones, however, the line separation is quite underestimated. The conclusion drawn is that for an accurate calculation of spectroscopic properties the three-site CO model and an extensive solvation shell is required. Anfinrud and co-workers proposed that the difference in binding rate between CO and 02 is due to the unfavourable orientation of CO upon rebinding [47,52]. However, if the geminate trapping and rebinding of the ligand exerts influence on the equilibrium binding affinity of the ligand in question then there must exist a low energy barrier protein conformational change that results in the same ligand response as laser flash photolysis of the complex. NO dissociation from metMb has been studied by Keserfi and Menyhfird from this point of view [ 113]. Monte Carlo multiple minimum searches and molecular dynamics simulations of 500 ps were carried out to simulate the effect of the changing charge distribution coupled to the torsional flip of the proximal His93. Selecting two different torsional angles of the proximal histidine (-113 ~ as seen in the hexacoordinated, wild-type structures and -125 ~ that was observed in pentacoordinated distal side mutants) the charge distribution of the active site of the metMb-NO complex was calculated [77]. The results were used in two independent conformational searches of the complex. The applied model was built based on the crystal structure of metmyoglobin. All backbone atoms were constrained, the heme unit and all atoms outside a 10 A sphere of the central iron were tethered to their original positions by a force constant of 20 kJ/A 2. The system was solvated by the continuum method of Still et al. [ 114]. Calculations were carried out employing the AMBER* force field [ 115]. The search involved 1000 Monte Carlo steps and the low energy conformers obtained were compared by the orientation of the substrate relative to the heme
79
unit. The heme parametrisation was not changed, neither were angle or bond potentials defined between the ligand and the heme in either simulations. Thus, all changes that occur between the two set of calculations are due to the different electrostatic environment that belongs to the two different orientations of His93. Both the Monte Carlo search and the molecular dynamics simulation located one low energy conformation of NO that belongs to the -113 ~ torsion of the proximal histidine, where the NO was found to be bound directly to the iron in a slightly bent conformation. Using the charge set belonging to the -125 ~ torsional angle the Monte Carlo search found two conformations of virtually identical energy. In both cases the NO molecule was found within the nearby hydrophobic, geminate trap (analogous to that determined for the CO molecule) in two opposite orientations. The simulation located only one of these conformers, and was also incapable of screening higher energy but still relevant arrangements. The torsional flip of His93, therefore, was identified as such an equilibrium conformational change of metmyoglobin that is capable of inducing the release and geminate trapping of NO, a behaviour analogous to the photodissociation of CO upon laser flash photolysis.
5. R E C O M B I N A T I O N
Globins are typically characterised by reversible binding of molecular oxygen and other small ligands like CO and NO. Although temperature dependent protein fluctuations were effectively investigated by X-ray and NMR experiments (cf. Section 2) these studies did not reveal how small diatomic ligands escape and rebind to the protein. Time resolved absorption spectroscopy was found to be useful to explore the effect of fluctuations on the reactivity of the protein. A number of such studies were published to investigate ligand recombination after photodissociation on the nanosecond to second time scales [9,93,116]. Geminate recombination of CO to sperm whale myoglobin at low temperatures were observed by Frauenfelder et al. using laser photolysis techniques [117]. The nonexponantial nature of CO rebinding observed at low temperatures was explained by a distribution of barriers formed due to the slow equilibration of different conformational substates [94]. This barrier distribution is characteristic when transition between conformational substates is slower than the rate of rebinding. It was found, however, that recombination becomes exponential at higher temperatures which can be attributed to a single barrier. Analysing the non-exponential characteristics of rebinding equilibrium and nonequilibrium motions should be considered. Equilibrium motions, that occur within a well of the potential energy surface, are responsible for fluctuations observed at low temperatures and result in inhomogeneous protein population.
80
At higher temperatures transitions between these wells are superimposed to harmonic fluctuations [3,21 ]. Non-equilibrium motions are due to the relaxation toward equilibrium. Since photodissociation creates a non-equilibrium state of the protein both type of motions can take part in the recombination process. In fact, Ansari et al. concluded [94] that the low temperature non-exponential behaviour observed for CO rebinding arises from an inhomogeneous protein population rather than a homogeneous one having multiple binding sites [93]. Although the kinetics of CO rebinding is clearly exponential at room temperature it has been suggested that protein relaxation might play a role in rebinding kinetics even at intermediate temperatures. This phenomena was investigated by Petrich et al. using molecular dynamics to explore the dissociation behaviour of carbonmonoxy myoglobin [67]. Henry et al. [100], performing one of the first molecular dynamics simulations of an ultrafast laser photolysis experiment by Martin et al. [ 109], concluded that the iron atom can be displaced from the heme plane. The trajectory corresponding to ironproximal histidine co-ordinate was therefore analysed by Petrich et al. and its transient behaviour suggested this co-ordinate to be involved in geminate recombination. The authors found that the iron-histidine distance is primary modulated by the relaxation of the protein and therefore it can be an important factor that regulates the distribution of rebinding rates. Analysis of the out-ofplane distance of iron relative to nitrogen atoms of the porphyrin ring revealed its possible role in non-exponential binding. Although this distance fluctuates by only 0.05-0.06 A at the equilibrium state formed after the photodissociation the out-of-plane distance of iron was found to be increased by 0.15 A on a 1-100 ps time scale. This indicates that the out-of plane motion of iron may give rise to a time-dependent barrier that finally results in non-exponential rebinding kinetics. Since the time scale of CO rebinding (-~100 ns) at room temperature is slower than the time for protein relaxation (up to 100 ps) it is not easy to test this hypothesis experimentally. In contrast to CO, NO binding was found to be exponential at higher temperatures as well and takes place on a faster time scale (up to 300 ps) than that of CO. Actually, geminate rebinding of NO discovered by Hochstrasser et al. [118] is one of the fastest known biochemical reactions. Thus, protein relaxation is likely to occur on the same time scale as the recombination which allows the investigation of rebinding kinetics under physiological conditions. Since steric [119,120] and electrostatic [118,121] barriers of recombination idemified for CO and 02 are diminished for NO, rebinding studies of this ligand can provide direct information on transient structural effects in rebinding at room temperature. Furthermore it has been demonstrated that NO rebinding is virtually independent on ligand concentration which implies that pure geminate recombination is involved [109]. Considering these advantages of NO it is not
81
surprising that most of the recent studies used this ligand to investigate rebinding at the molecular level. Based on spectroscopic [122], X-ray [43] and Raman studies Campbell et al. concluded that two possible conformation of the distal histidine control the motion of the ligand in sperm whale myoglobin [97,123]. This hypothesis explains the observed difference in NO rebinding of sperm whale and elephant myoglobins. In elephant myoglobin the critical distal histidine is replaced by a glutamine having a single conformation which is consistent with the single exponential recombination of NO in this protein. Although the comparison of recombination kinetics of NO in sperm whale and elephant myoglobins supported the influence of distal histidine on NO rebinding LaMar et al. [124] showed that there are many other differences between the binding pockets of these proteins. Structural effects on the proximal side of the heine were also proposed to be responsible for the kinetics of rebinding of NO observed in globins [67]. Drastic reduction of NO rebinding rate in inositol hexaphosphate (IHP) bound hemoglobin suggested that proximal constraints such as affected by IHP have a characteristic effect in ligand rebinding. A possible cause of this proximal effect is the out-of-plane motion of iron from the plane formed by pyrrole nitrogens of the heme which decreases the rate of recombination [125]. An alternative hypothesis for heterogeneous rebinding time courses was based on the proposed diffusion of ligands away from the heme centre. Rebinding for these molecules should take a longer time than for those remaining close to the heme pocket. Heterogeneity observed during the experimental time course can be rationalised by two different theories: the diffusive protein co-ordinate model created by Elber and Karplus [126] and the substate mechanism suggested by Frauenfelder and Wolynes [120]. Analysis of the 50 ps long room temperature simulation of the dissociated NO complex of the native protein [127] revealed that the behaviour of trajectories that differ only in initial velocities is significantly different. This result suggested that heterogeneity observed in the experimental time course might be due to rebinding within different conformational substates as it was suggested by Frauenfelder and Wolynes [120]. Therefore, the correct description of picosecond recombination requires multiple protein trajectories rather than an equilibrium protein structure in which the ensemble average was provided by a single calculation. Experimentally, there are mutant myoglobins to test whether the ligand diffusion has an influence on geminate recombination as it was suggested by theoretical calculations. One of the first rebinding study performed on myoglobin mutants was published by Gibson et al. in 1991 [ 128]. Replacing the critical distal histidine (His64) with Val, Leu, Phe and Gln has little effect on NO recombination which is consistent with the proposal of Petrich et al. that
82
picosecond ligand recombination primarily depends on the reactivity of the iron atom [67]. Replacement of Leu29 with Ala, Val and Phe, however, caused significant change in the rate of rebinding [127]. X-ray data demonstrated that mutants are remarkably isomorphous with the wild type protein except for the size of side chain of the mutated residue [129]. The average structure of the heine, the iron-proximal histidine bond length and also the bound ligand-distal histidine interaction were found to be virtually identical to that of the native form. Thus the authors concluded that changes observed in the recombination properties are most likely due to changes in ligand diffusion within the distal cavity. This theory was further supported by a series of molecular dynamics simulations. Trajectories of photodissociated ligands were calculated up to 50 ps and analysed to approximate rebinding rates observed for the mutants. Although it was expected that molecular dynamics simulation correlates with the observed picosecond time course for NO recombination significant differences were detected. These differences were explained by the lack of specific iron-ligand potential function required for the consideration of ligand orientation and spin state. Another but more significant problem is the short simulation time since it is unlikely to explore all possible ligand trajectories in 50 ps starting from initial velocities. Considering the diffusive motion of the ligand one can expect that relevant protein conformations can convert rapidly and the ligand follows all the trajectories of escape and rebinding. If the ligand motion was limited by the actual protein conformation existing at the dissociation, rebinding could be characterised by slowly interconverting protein conformations that appear to be frozen relative to the rapid movement of the ligand. In fact the authors found that the pattern of initial atomic velocities directs the trajectory to a particular protein conformation which determines the allowed ligand positions and pathways. Assuming first order recombination, hypothetical time courses for all of the mutants were calculated. These calculations reproduced the experimental order of rebinding of the mutants. Although the difference in rebinding properties of myoglobin mutants was successfully rationalised on the basis of simulations an interesting anomaly between short and long time scale NO rebinding was also identified. Picosecond time scale recombination rates follows the following order: Phe29 > Leu29 > Va129 (wild type) > Ala29 while on the nanosecond time scale rebinding in Phe29 mutant was found to be slower than in the Ala29 mutant. Molecular dynamics could, however, be used to explore this phenomena as well. Short time simulations performed by Li et al. [ 130] revealed that Phe29 squeezes the ligand against the heme and therefore recombination should be fast. Long time simulations, however, pointed out that the phenyl ring prevents the diffusion of the ligand to the nearest cavity (B/G contact) and forces it to another pocket (E/F comer or CD loop). Rebinding from these latter sites is associated with a barrier
83
and therefore recombination is hindered. Since alanine did not prevent diffusion to the nearest cavity recombination rate for this mutant is higher than that obtained for the Phe29 mutant. Effect of mutations was further investigated by Carlson et al. analysing NO recombination properties of a number of double mutant myoglobins [68]. Although their previous study suggested that the rate of recombination depends on the volume available to ligands in the heme pocket experiments with double mutants gave unexpected results. Preparing the double mutant His64Gly/Va168Ala Egeberg et al. [131] expected that the increased volume of the distal pocket facilitates the movement of the ligand and therefore the rate of geminate recombination will be lower than that of the native protein. In fact, however, they found that the rate of recombination was greater than in native myoglobin. Another double mutant, His64Gly/Va168Ile, resulting in a smaller extension of the heine pocket than in case of His64Gly/Va168Ala, instead of recombining faster rebound much slower. Results obtained for these double mutants illustrated another fundamental issue i.e. the role of distal His64 in ligand binding as it was previously suggested by Campbell et al. [97]. X-ray diffraction studies on native myoglobin revealed that His64 is located near the surface of the protein and it has been proposed that this residue functions as a gate, controlling access and escape of the ligand [132,133]. Unexpected rebinding results obtained for double mutant myoglobins inspired Gibson et al. to perform molecular dynamics simulations [127]. The crystal structure of the His64Gly/Va168Ala mutant has been determined and used as a starting structure for the simulations. Comparing the X-ray structure of the double mutant to that of the native protein there was no major rearrangement detected. Since a methyl group of the Va168 side chain was removed the binding pocket collapsed slightly. Mutation of the distal His64, however, showed that a well-ordered water molecule was positioned to interact with the ligand. Ligand accessible volumes in the mutant and that in the native protein were found to be similar suggesting that the effect of mutations cannot be due to simple volume effects. Molecular dynamics simulations on His64Gly/Va168Ala and His64Gly/Va168Ile mutants as well as on the native protein were performed in water using all of the crystallographic water molecules. Ten trajectories 50 ps long were calculated for each protein. Analysis of trajectories obtained for the mutants and a comparison to that of the native protein revealed that steric hindrance has a major role in determining the recombination rate. Recombination in the mutants may be explained in terms of fluctuating free volumes and structure of the heme pocket. Although the distal His64 usually forms stabilising interactions with the ligand the authors claimed that its kinetic effect is just the opposite. They found that steric effects on ligand rebinding depend mainly on the positions of side chains at the distal side. This finding is in
84
contrast to the proposal of Petrich et al. [67] which attributed non-exponential kinetics to proximal effects. The most important result of this mutant study is the emphasis of solvation in geminate rebinding. Simulations on solvated proteins demonstrated that the solvation shell around the protein creates a secondary barrier to the escape of the ligand. Since the mutated His64 is in direct contact with the external solvent the significant effect of solvation is not unexpected in the mutants. Water molecules of the first solvation shell can replace mutated surface residues (c.f. the X-ray structure of the His64Gly mutant) and block potential sites for geminate trapping. As a consequence this blocking increases the total time necessary for the ligand to escape from the heme pocket. Molecular dynamics simulations performed on other distal side double mutants (mutated at Va168 and Ilel07) showed that a pattern of cavities fluctuate and interconvert due to protein motions [ 134]. The authors suggested that these fluctuations have influence on the access to the iron atom and therefore affect recombination of the ligand. The positions of helices around the distal pocket were also monitored and it was demonstrated that these helices accommodate the mobile diatomic ligand which suggests a mechanism for communication between the heme pocket and the exterior of the protein. Although theoretical calculations reproduced rebinding properties of mutant myoglobins qualitatively, the lack of specific iron-ligand potential and short simulation times did not allow the calculation of absolute rebinding rates. Problems associated with the lack of iron-ligand potential function were overruled by Elber et al. [130] constructing the ground and the excited state potentials between NO and the heme on the basis of experimental data. These two crossing potential energy functions were introduced to the conventional CHARMM force field and classical ground and excited trajectories were calculated on these surfaces. Crossing between the two states was modelled via the Landau-Zerner formula [135]. Switching between binding and non-binding heme potentials involved a switching function defined for nuclear motion on the electronic ground state potential surface. After adjusting excited state and crossing parameters by the trial and error method short time scale simulations reproduced the order of recombination observed experimentally. Qualitative agreement with experimental data allowed the in-depth investigation of protein co-ordinates in a long time scale (0.5 fs). One of the most important conclusion of this study is that no longer relaxation time for the iron atom beyond a few picosenconds was observed. A similar result was obtained by Eaton et al. simulating NO recombination in myoglobin [100] but the statement is in contrast to that published by Petrich et al. [67] identifying long time relaxation for the iron atom and also to the results of Kuczera et al. identifying a time-dependent shift in the iron position [ 125].
85
Simulations performed by Henry et al. [136] also support this hypothesis. These authors carried out molecular dynamics simulations independently using a similar potential function to that applied by Li et al. [130]. Investigating the kinetics of NO rebinding to native myoglobin they used a potential function that switches between non-binding and binding potentials as a function of the position of the ligand. To simulate dissociation and subsequent rebinding three distinct potential functions were applied. The potential function of the hexacoordinated heine with bound NO (ligand binding potential) contains iron-ligand interaction terms including a Morse potential for the distance between the iron and the nitrogen atom of the ligand. The second potential was designed for the description of the ligand-free heine (ligand free potential). Bonding interactions between the ligand and heine were replaced by nonbonding interactions between NO and the pyrrole nitrogens. The most important rebinding potential included bonding terms and several switching functions. One of these switching functions was applied for the attenuation bending interactions as the iron-ligand distance increases. Another switching function was used to attenuate van der Waals interactions between the ligand and the pyrrole nitrogens of the heme. This function was constructed so that rebinding potential switches between the ligand binding and ligand free potentials. Ligand free potential was applied when the ligand was located far from the binding site or its orientation was unfavourable for binding and it was changed to the binding potential at optimal ligand positions. Based on the analysis of iron displacement along 12 trajectories Henry and cowokers [136] could not identify time dependent changes. In agreement with Li et al. [130] their analysis suggested that there is no contribution to the nonexponential rebinding of NO from conformational substates different in iron position that do not convert rapidly on the time scale of NO rebinding. Since CO rebinding was found to be slower than that of the NO it is still possible that iron displacement plays some role in the non-exponential rebinding of this ligand. Transition state analysis for NO rebinding, however, clearly demonstrated that this motion is not responsible for the non-exponential rebinding of NO. Kinetic description of NO rebinding was carried out using five sets of 20 trajectories each originating from a different 200 ps ligated trajectory. These sets are therefore considered as conformational substates for which the kinetic curve was calculated. Progress curves showed only very little difference i.e. in 96 of 100 trajectories the ligand rebound within 15 ps. Analysing ligand positions along these trajectories the authors concluded that NO remained in a pocket formed by Phe43, His64, Leu29, Val 68 and Ile 107 found to be very close to the iron. Four of 100 trajectories indicated that NO can also escape from the heme pocket and suggested a competition with the fast geminate rebinding. The experimental kinetic curve was well described by a double exponential potential
86
with time constants of 28 ps and 280 ps. A comparison between this and the calculated curve revealed that the 28 ps relaxation measured corresponds to binding from the heme pocket. The multi-exponential nature of rebinding, however, suggests the coexistence of multiple pockets inside the protein. The agreement between escape rates calculated from simulations and experiments also supports this hypothesis. In addition to the electronic barrier associated with crossing potential surfaces introduced by Li et al. [130] Henry et al. [136] suggested two further contributions to the free energy of rebinding. One of these is the steric interaction between the ligand and distal side residues, the other one is the requirement that the ligand should be localised in a correct orientation for binding. These contributions were approximated using Langevin dynamics calculating the enthalpy and entropy effects of steric interactions and the entropy effect of optimal ligand positioning. Although this method was considered as a rough estimate of the magnitude of these barriers, results agreed well with early suggestions [119] that the bent orientation of NO allows the binding of this ligand without significant steric barriers.
6. LIGAND M I G R A T I O N As it was already noted when the first X-ray structure of myoglobin has been solved by Perutz and Matthews, the protein environment of the heme, if rigid, would prevent the entrance and exit of even small ligands [1]. Thus, protein fluctuations of myoglobin are certainly a crucial part of its biological function. In the first theoretical quest for escape routes, the O2 molecule was treated as a high-energy sphere making its way through the rigid protein matrix of myoglobin. The majority of escaped molecules passed His64, Thr67 and Va168, a second part of the molecules first moved to a pocket bordered by Leu29, Leu61 and Phe33 then left the protein between Leu61 and Phe33. However, these pathways had very high energy barriers of around 420 kJ/mol [ 132]. This number could be reduced to about 40 kJ/mol in a fluctuating protein, which would be a biologically acceptable value. Polarity, rather than size has been identified as important in the contribution of amino acids to the barrier height between the heme pocket and the solvent in a series of photolysis experiments of O2 and CO complexes of several distal side mutants of Mb [128]. Another mutation study suggested that the distal histidine has a role in blocking ligand escape [137], however, no compelling evidence supported that the major pathway in the wild type protein goes by the distal histidine also. The results of fluorescence quenching measurements indicated that a rather large portion of the protein can be penetrated by the physiological ligands of the
87
globins [ 13 8,13 9,140]. This finding has gained theoretical support from xenon binding studies. A molecular dynamics simulation, where all protein atoms were allowed to move, revealed that Xe atoms can take rather complicated routes prior to exit, spending a considerable amount of time in a connecting network of channel-like pathways within the protein interior [141]. It has also been concluded that protein fluctuations cause changes of 3 to 4 % in the protein volume as compared to that crystallographically determined. This leads to changes in the shape, size and location of those cavities that are large enough to host a ligand molecule [142]. All molecular dynamics studies set out to deal with the problem of migration had to face a decision. Either several ligand trajectories have to be examined in a rigid protein environment or protein movements have to be allowed which, however, imposes a time limitation on the number of possible trajectories. These limitations were overcome in the work of Elber and Karplus [126] who applied the time-dependent Hartree approximation to this phenomenon as was implemented by Gerber et al. [143]. This made it possible to treat simultaneously an ensemble of photodissociated CO molecules moving through the protein matrix. The CO molecules move in the field of a single set of protein co-ordinates, the protein trajectory, however, is approximate since the protein atoms move in the average field of all CO molecules. The basic hypothesis of the method is that the ligands do not introduce a major perturbation in protein fluctuations which are expected to be governed by the interaction within the protein. The kinetic energy of the ligand was increased by heating at fixed time intervals for short periods and then gradually cooling it to room temperature, while the protein temperature was kept near 300 K by velocity scaling. The kinetic energy of the ligand was increased because classical dynamics has been shown to be inefficient in sampling the rare fluctuations of the protein which assist ligand escape [132]. Solvent was not included in the calculations for which the models were derived from the X-ray structure of carbon-monoxy myoglobin [43]. The energy was minimised using deoxy heine parameters so that only non-bonded interactions were defined between the heme and the CO ligand. Three set of different simulations (100 ps each) were carried out each using sixty CO molecules, one where all atoms were free to move, one constraining all protein and heme atoms and one where the polypeptide chain and heme were rigid but side chains and the ligands were allowed to move. Four significant cavities and one semi-cavity, thus five major pathways within the protein matrix were identified. These are the (1) EF loop and the N terminal loop (2), A/E helices (3), AB loop and the G helix (4) and the proximal histidine (5) CD loop. The diffusion motion of the ligand could be described by a few-site hopping scheme in which the ligand is trapped for a significant time in individual cavities and then hops to another cavity or finally to the solvent. Even
88
using the high temperature ligand only a few molecules were able to escape the rigid or partly restricted protein matrix. Inspection of the trajectories showed that the barriers between the cavities are significantly reduced by the protein fluctuations. Czerminski and Elber studied the diffusion of CO in lupine leghemoglobin, a protein of similar overall fold but different, much faster, diffusion rates than myoglobin. Their goal was to compare theoretical estimates of diffusion for the two proteins [144]. The locally enhanced protocol (LES), quite similar to that used by Elber and Karplus [126] suggested that the mechanism of ligand penetration and escape is different for the two proteins. In myoglobin many alternate routes exist while in leghemoglobin application of the same technique described the ligand escaping along a well-defined, practically unique path. The results of this LES study were later refined using the self penalty walk (SPW) algorithm [145] developed by Czerminski and Elber [146]. SPW provides a more detailed picture than LES methods since barrier hights for diffusion can be estimated and the quenching of a significant fraction of protein motions that are not coupled to the diffusion of the ligand is made possible. Thus the method helps to elucidate structural features of gate openings. Three distinct reaction coordinates were explored following three diffusion paths. Local properties of the co-ordinates in the vicinity of the CO ligand were found to be similar supporting the original view of one escape channel in leghemoglobin. The diffusion process consisted of only two steps, in contrasts to the few-site hopping model of CO diffusion in myoglobin established by Elber and Karplus [126]. In the first the ligand jumps to a cavity in the protein matrix assisted by the tilt of Phe29 then it hops to the exterior in which the global translations and rotations of helices C and G are involved. In the first step of the process the barrier is local, however, in the second step significant coupling to low frequency modes is observed as shown in a further elucidation of the problem [147]. A number of experimental studies complemented with LES calculations have been carried out on different point mutants of myoglobin with a diverse set of results that tend in a direction contrary to the many-route escape model. In an experimental and theoretical study of ligand migration in myoglobin (over 25 different oxymyoglobin point mutants were studied by laser flash photolysis and the LES method) Scott and Gibson [148] conclude that secondary docking sites of 02 are unlikely and refer, by exclusion, to the original hypothesis of escape through the histidine gate as was suggested by Case and Karplus [132]. In this picture, after the photolysis the ligand molecules move toward the interior of the protein within the first few picoseconds then return to the proximity of the iron either to be recaptured or to escape giving rise to the so called primary recombination with a relaxation half-time of some 20 ns. Other ligands move to the site surrounded by Gly25, Ile28, His93, Va168 and Ile107 or to the edge of
89
the heme below the heme plane only to remm and be recaptured on gs time scales in the so-called secondary recombination process. However, no actual escape route was mapped by the calculations, only several different recombination processes. A quite different escape model emerges from the work of Brunori et al. [149] proposing that the escape of the ligand is through the secondary site in the studied case, which disagrees with the hypothesis of Scott and Gibson [148]. The secondary docking site of small molecular ligands overlapping the Xe(4) site [141 ] was identified in a distal side triple mutant myoglobin also. The triple mutant of myoglobin was synthesised to rationalise the characteristic difference in 02 dissociation rates between MB and Ascaris hemoglobin. This latter has an extremely low 02 dissociation constant. The distal pocket of Ascaris hemoglobin differs only in three amino acids from Mb, this difference was mimicked by Leu29Tyr, His64Gln and Thr67Arg point mutations. Although the H-bonding pattern stabilising the 02 ligand in Ascaris hemoglobin was reproduced by the mutations, as hoped, the dissociation rate for the triple mutant was still over 200 fold faster than that of Ascaris hemoglobin. To find a plausible explanation the LES method of Elber and Karplus [ 126] was applied to study the migration paths of NO (a more reactive ligand with similar diffusion constant) within the protein interior of both proteins. Simulations were started from the respective crystal structures with all crystallographic water molecules included as TIP3P [73] explicit molecules. Runs were repeated in the presence of xenon as well. Eight trajectories were collected, each over 50 ps. In five runs the ligand cloud stayed close to the binding position within 4 A of the iron. In two runs after 10 ps the 8-carbon of Ilel07 swung around opening a path communicating with the Xe(4) site so the ligands could dock into the cavity formed by Gly25, Ile28, Vla68, Leu69 and Ile107 approximately 9 A from the iron. Docking within this secondary site and the partial return of the ligands toward the iron might generate, according to the model, the slow component of the geminate recombination reaction measured for the mutant. In Ascaris hemoglobin, however, a Phe residue is found in place of Ile107 of Mb, which, instead of opening the gate toward the secondary site and sequentially to the escape of the ligand, blocks this path. Therefore, it enhances the geminate recombination of the ligands that stay trapped in the iron-close primary site. The authors propose this effect to be the cause of the unusually low dissociation rates measured for Ascaris hemoglobin.
90
REFERENCES
1. M.F. Perutz and F.S. Matthews, J. Mol. Biol., 21 (1965) 199. 2. C.L. Nobbs, H.C. Watson and J.C. Kendrew, Nature, 209 (1966) 339. 3. R. Elber, and M. Karplus, Science, 235 (1987) 318. 4. R.M. Levy and M. Karplus, Biopolymers, 18 (1979) 2465. 5. M. Karplus and J.N. Kushick, Macromolecules, 14 (1981) 325. 6. M. Levitt, C. Sander and P.S. Stem, J. Mol. Biol., 181 (1985) 423. 7. W. Bialek and R.F. Goldstein, Biophys. J., 48 (1985) 1027. 8. P.G. Debrunner and H. Frauenfelder, Annu. Rev. Phys. Chem., 33 (1982) 283. 9. A. Ansari, J. Berendzen, S.F. Browne, H. Frauenfelder, I.E.T. Iben, T.B. Sauke, E. Shyamsunder and R.D. Young, Proc. Acad. Natl. Sci., U. S. A., 82 (1985) 5000. 10. S. Swaminathan, T. Ichiye, W. van Gusteren and M. Karplus, Biochemistry, 21 (1982) 5230. 11. J. Kuriyan, G.A. Petsko, R.M. Levy and M. Karplus, J. Mol. Biol., 190 (1986) 227. 12. J.L. Smith, W.A. Hendrickson, R.B. Honzatko and S. Sheriff, Biochemistry, 25 (1986) 5018. 13. A.M. Lesk and C. Chothia, J. Mol. Biol., 136 (1980) 225. 14. S. Corbin, J.C. Smith and G.R. Kneller, Proteins: Struct. Funct. Genet., 16 (1993) 141. 15. D.L. Stein, Proc. Natl. Acad. Sci., U.S.A. 82 (1985) 3670. 16. K. Kuczera, J. Kuriyan and M. Karplus J. Mol. Biol., 213 (1990) 351. 17. H. Frauenfelder, G.A. Petsko and B. Bianchi, Nature, 280 (1979) 558. 18. H. Frauenfelder, H. Hartmann, M. Karplus, I. D. Kuntz, J. Kuriyan, F. Parak, G.A. Petsko, D. Ringe, R.F. Tilton, M.L. Conolly and N. Max, Biochemistry, 26 (1987) 254. 19. H. Hartmann, F. Parak, W. Steigemann, G.A. Petsko, D.R. Ponzi and H. Frauenfelder, Proc. Natl. Acad. Sci., U.S.A., 79 (1982) 4967. 20. F. Parak, E.N. Frolov, R.L. M6ssbauer and V.I. Goldanski, J. Mol. Biol., 145 (1981) 825. 21. J. Smith, K. Kuczera and M. Karplus, Proc. Natl. Acad. Sci., U.S.A., 87 (1990) 1701. 22. F. Parak, E.W. Knapp and D. Kucheida, J. Mol. Biol., 161 (1982) 177. 23. E.R. Henry, Biophys. J., 64 (1993) 869. 24. W. Nowak, J. Mol. Structure THEOCHEM, 398-399 (1997) 537. 25. J. Smith, K. Kuczera, B. Tidor, W. Doster, S. Cusack and M. Karplus, Physica B, 156157 (1989) 437. 26. G.R. Kneller and J.C. Smith, J. Mol. Biol., 242 (1994) 181. 27. D.J. Danziger and P.M. Dean, Proc. Roy. Soc. Lond., B236 (1989) 101. 28. M. Schmidt, F. Parak and G. Coronghiu, Int. J. Quant. Chem., 59 (1996) 263. 29. W. Gu and B.P. Schoenborn, Proteins: Struct. Funct. Genet., 22 (1995) 20. 30. W. Gu, A.E. Garcia and B.P. Schoenbom, Basic Life. Sci., 64 (1996) 289. 31. V. Lounnas and M.B. Pettit, Proteins: Struct. Funct. Genet., 18 (1994) 133. 32. H. Frauenfelder, F. Parak and R.D. Young, Annu. Rev. Biophys. Biophys. Chem., 17 (1988) 451. 33. Y.F. Krupyanskii, F. Parak, V.I. Goldanksii, R.L. M6ssbauer, E.E. Gaubman, H. Engelmann and I.P. Suzdalev, Z. Naturforsch., C37 (1982) 57. 34. P.J. Steinbach and B.R. Brooks, Proc. Natl. Acad. Sci. U.S.A., 90 (1993) 9135. 35. P.J. Steinbach and B.R. Brooks, Proc. Natl. Acad. Sci. U.S.A., 93 (1996) 55. 36. P.J. Steinbach and B.R. Brooks, Chem. Phys. Lett., 226 (1994) 447.
91
37. B.K. Andrews, T. Romo, J.B. Clarage, M.B. Pettitt and N.G. Phillips, Structure, 6 (1998) 587. 38. C.L. Brooks III, J. Mol. Biol., 227 (1992) 375. 39. J.D. Hirst and C.L. Brooks III, Biochemistry, 34 (1995) 7614. 40. S.E.V. Phillips and B.P. Schoenborn, Nature, 292 (1981) 81. 41. L. Stryer, Biochemistry, W.H. Freeman & Co, New York, 1988. 42. B.A. Springer, S.G. Sligar, J.S. Olson and G.N. Phillips Jr., Chem. Rev., 94 (1994) 699. 43. J. Kuriyan, S. Wilz, M. Karplus and G.A. Petsko, J. Mol. Biol., 192 (1986) 133. 44. X. Cheng and B.P. Schoenborn, J. Mol. Biol., 220 (1991) 381. 45. M.L. Quillin, R.M. Arduini, J.S. Olson and G.N. Phillips Jr., J. Mol. Biol., 234 (1993) 140. 46. F. Young and G.N. Phillips Jr., J. Mol. Biol., 256 (1996) 762. 47. M. Lim, T.A. Jackson and P.A. Anfinrud, Science, 269 (1995) 962. 48. J.T. Sage and W. Jee, J. Mol. Biol., 274 (1997) 21. 49. J.P. Collman, J.I. Brauman, T.R. Halbert and K.S. Suslick, Proc. Natl. Acad. Sci. U.S.A., 73(1976) 3333. 50. D.A. Case and M. Karplus, J. Mol. Biol., 123 (1978) 697. 51. T. Li, M.L. Quillin, G.N. Phillips Jr. and J.S. Olson, Biochemistry, 33 (1994) 1433. 52. M. Lim, T. A. Jackson and P.A. Anfinrud, J. Chem. Phys., 102 (1995) 4355. 53. J.O. Alben, D. Beece, S.F. Bowne, W. Doster, L. Eisenstein, H. Frauenfelder, D. Good, D. McDonald, M.C. Marden, P.P. Moh, L. Reinisch, A.H. Reynolds, E. Shyamsunder and K.T. Yue, Proc. Natl. Acad. Sci. U.S.A., 79 (1982) 3744. 54. E. Oldfield, K. Guo, J.D. Augspurger and C.E. Dykstra, J. Am. Chem. Soc., 113 (1991) 7537. 55. J. Vojtechovsky, K. Chu, J. Berendzen, R.M. Sweet and I. Schlichting, Biophys J., 77 (1999)2153. 56. E.E. Abola, J.L. Sussman, J. Prilusky and N.O. Manning, Methods. Enzymol., 277 (1997) 556. 57. J. L. Sussman, L. Lin, J. Jiang and N.O. Manning, Acta Cryst., D54 (1998) 1078. 58. G.S. Kachlova, A.N. Popov and H.D. Bartunik, Science, 284 (1999) 473. 59. S. Bhattacharya and J.T. Lecomte, Biophys J., 73 (1997) 3241. 60. R.F. Eich, T. Li, D.D. Lemon, D.H. Doherty, S.R. Curry, J.F. Aitken, A.J. Mathews, K.A. Johnson, R.D. Smith, G.N. Phillips Jr. and J.S. Olson, Biochemistry, 35 (1996) 6976. 61. M. Hoshino, K. Ozawa, H. Seki and P.C. Ford, J. Am. Chem. Soc. 115 (1993) 9568. 62. V.S. Sharma, R.A. Isaacson, M.E. John, M.R. Waterman and M. Chevien, Biochemistry, 32(1993) 3897. 63. N.V. Gordunov, A.N. Osipov, B.W. Day, B. Zayas-Rivera, V. Kagan and N.M. Elsayed, Biochemistry, 34 (1995) 6689. 64. L.J. Ignarro, C.M. Buga, K.S. Wood, R.W. Byrns and G. Chaudhuri, Proc. Natl. Acad. Sci. U. S. A., 84 (1987) 9265. 65. R.M.J. Palmer, A.G. Ferrige and S. Moncada, Nature 327 (1987) 524. 66. M. Hoshino, K. Ozawa, H. Seki and P.C. Ford, J. Am. Chem. Soc., 115 (1993) 9568. 67. J.W. Petrich, J.C. Lambry, K. Kuczera, M. Karplus, C. Poyart and J.L. Martin, Biochemistry, 30 (1991) 3975. 68. M.L. Carlson, R. Regan, R. Elber, H. Li, G.N. Phillips Jr. and Q.H. Gibson, Biochemistry, 33 (1994) 10497.
92
69. E.A. Brucker, J.S. Olson, M. Ikeda-Saito and G.N. Phillips Jr., Proteins: Struct. Funct. Genet., 30 (1998) 352). 70. M.A. Lopez and P.A. Kollman, Protein Sci., 2 (1993) 1975. 71. P. Jewsbury and T. Kitagawa, Biophys. J., 67 (1994) 2236. 72. X. Cheng, J.C. Norwell, A.C. Nunes and B.P. Schoenbom, Science, 190 (1975) 568. 73. W.L. Jorgensen, J. Chandrasekhar, J.D. Madura, R.W. Impey and M.L. Klein, J. Chem. Phys., 79 (1983) 926. 74. P. Jewsbury and T. Kitagawa, Biophys. J., 68 (1995) 1283. 75. P. Jewsbury, S. Yamamoto, T. Minato, M. Saito and T. Kitagawa, J. Am. Chem. Soc., 226 (1994) 11586. 76. P. Jewsbury, S. Yamamoto, T. Minato, M. Saito and T. Kitagawa, J. Phys. Chem., 99 (1995) 12677. 77. D.K. Menyhfird and G.M. Keserfi, J. Am. Chem. Soc., 120 (1998) 79911. 78. M.L. Quillin, R.M. Arduini, J.S. Olson and G.N. Phillips Jr., J. Mol. Biol. 243 (1993) 140. 79. C.W. Rella, K. Rector, A. Kwok, J.R. Hill, H.A. Schwettmann, D.D. Dlott and M.D. Fayer, J. Phys. Chem., 100 (1996) 15620. 80. B. Kushkuley and S.S. Stavrov, Biophys. J., 70 (1996) 1214. 81. B. Kushkuley and S.S. Stavrov, Biophys. J., 72 (1997) 899. 82. G.N. Phillips Jr., M.L. Teodoro, T. Li, B. Smith and J.S. Olson, J. Phys. Chem., 103 (1999) 8817. 83. M. Davis, J. Madura, B. Luty and J.A. McCammon, Program. Comput. Phys. Commun. 62 (1990) 187. 84. A. Gosh and F.D. Bocian, J. Phys. Chem., 100 (1996) 6363. 85. U. von Barth and L.J. Hedin, Phys. Chem., 5 (1972) 1629. 86. X.Y. Li and T.G. Spiro, J. Am. Chem. Soc., 110 (1988) 6024. 87. M.T. McMahon, A.C. DeDios, N. Godbout, R. Salzmann, D.D. Laws, H. Le, R.H. Havlin and E. Oldfield, J. Am. Chem. Soc., 120 (1998) 4784. 88. V.G. Malkin, O.L. Malkina, M.E. Casida and D.R. Salahub, J. Am. Chem. Soc., 116 (1994) 5898. 89. W. Kutzelnigg, U. Fleischer and M. Schindler in, NMR- Basic Principles and Progress, Vol. 28, Springer, Heidelberg, 1990, p. 1965. 90. A.J.H. Wachters, J. Chem. Phys., 52 (1970) 1033. 91. H. Le, J.G. Pearson, A.C. DeDios and E. Oldfield, J. Am. Chem. Soc., 117 (1995) 3800. 92. T.G. Spiro and P.M. Kozlowski, J. Am. Chem. Soc., 120 (1998) 4524. 93. R.H. Austin, K.W. Beece, L. Eisenstein, H. Frauenfelder and I.C. Gunsalus, Biochemistry, 14 (1975) 5355. 94. A. Ansari, J. Berendzen, D. Braunstein, B.R. Cowen, H. Frauenfelder, M.K. Hong, I.E.T. Iben, J.B. Johnson, P. Ormos, T.S. Sauke, R. Scholl, A. Schulte, P.J. Steinbach, J. Vittitow and R.D. Young, Biophys. Chem., 26 (1987) 337. 95. A. Ansari, E.E. Dilorio, D.D. Dlott, H. Frauenfelder, P. Langer, H. Roder, T.B. Sauke and E. Shyamsunder, Biochemistry, 25 (1986) 3139. 96. M.D. Chatfield, K.N. Walda and D. Madge, J. Am. Chem. Soc., 112 (1990) 4680. 97. B.F. Campbell, M.R. Chance and J.M. Friedman, Science, 238 (1987) 373. 98. P.A. Anfinrud, C. Han and R.M. Hochstrasser, Proc. Natl. Acad. Sci. U.S.A., 86 (1989) 8387.
93
99. J.L. Martin, A. Migus, C. Poyart, Y. Lecarpentier, R. Astier and A. Antonetti, Proc. Natl. Acad. Sci. U.S.A. 80 (1983) 173. 100. E.R. Henry, M. Levitt and W.A. Eaton, Proc. Natl. Acad. Sci. U.S.A. 82 (1985) 2034. 101. I. Schlichting, J. Berendzen, G.N. Phillips Jr. and R.M. Sweet, Nature, 371 (1994), 808. 102. T.Y. Teng, V. Srajer and K. Moffat, Nature Struct. Biol., 1 (1994) 701. 103. H. Hartmann, S. Zinser, P. Komninos, R.T. Schneider, G.U. Nienhaus and F. Parak, Proc. Natl. Acad. Sci. U.S.A. 93 (1996) 7013. 104. M. Lim, T.A. Jackson and P.A. Anfinrud, Nature Struct. Biol., 4 (1997) 209. 105. J.O. Alben et al. Phys. Rev. Lett., 44 (1980) 1157. 106. J.E. Straub and M. Karplus, Chem. Phys., 158 (1991) 221. 107. D. Vitkup, G.A. Petsko and M. Karplus, Nature Struct. Biol., 4 (1997) 202. 108. J. Ma, S. Huo and J.E. Straub, J. Am. Chem. Soc., 119 (1997) 2541. 109. J.L. Martin, A. Migus, C. Poyart, Y. Lecarpentier, R. Astier and A. Antonetti, Proc. Natl. Acad. Sci. U.S.A. 80 (1983) 173. 110. J. Meller and R. Elber, Biophys. J., 74 (1998) 789. 111. J.S. Weiner, P.A. Kollman and D.T. Nguyen, J. Comput. Chem., 7 (1986) 230. 112. W.L. Jorgensen and J. Tirado-Rives, J. Am. Chem. Soc., 110 (1988) 1657. 113. G.M. Keserfi and D. K. Menyhfird, Biochemistry, 38 (1999) 6614. 114. W.C. Still, A. Tempczyk, R.C. Hawley and T. Hendrickson, J. Am. Chem. Soc., 112 (1990) 6127. 115. D.Q. McDonald and W. C. Still, Tetrahedron Lett., 33 (1992) 7743. 116. E.R. Henry, M. Levitt and R.M. Hochstrasser, Proc. Natl. Acad. Sci. U.S.A., 83 (1986) 8982. 117. P.J. Steinbach, A. Ansari, H.J. Berendzen, D. Braunstein, K. Chu, B.R. Cowan, D. Ehrenstein, H. Frauenfelder, J.B. Johnson, D.C. Lamb, S. Luck, J.R. Mourant, G.U. Nienhaus, P. Ormos, A. Xie and R.C. Young, Biochemistry, 30 (1991) 3988. 118. P.A. Cornelius, R.M. Hochstrasser and A.W. Steele, J. Mol. Biol., 163 (1983) 119. 119. A. Szabo, Proc. Natl. Acad. Sci. U.S.A., 75 (1978) 2108. 120. H. Frauenfelder and P.G. Wolynes, Science, 229 (1985) 337. 121. J.W. Petrich, C. Poyart and J.L. Martin, Biochemistry, 27 (1988) 4049. 122. K.A. Jongeward, D. Magde, D.J. Taube, J.C. Marsters, T.G. Traylor and V.S. Sharma, J. Am. Chem. Soc., 110 (1988) 380. 123. B.F. Campbell, M.R. Chance and J.M. Friedman, J. Mol. Biol., 262 (1987) 14885. 124. G.N. LaMar, F. Dalichow, X. Zhao, Y. Don, M. Ikeda-Saito, M.L. Chin and S.G. Sligar, J. Biol. Chem., 269 (1994) 29629. 125. K. Kuczera, J.C. Lambry, J.L. Martin and M. Karplus, Proc. Natl. Acad. Sci. U.S.A., 90 (1993) 5805. 126. R. Elber and M. Karplus, J. Am. Chem. Soc., 112 (1990) 9161. 127. Q.H. Gibson, R. Regan, R. Elber, J.S. Olson and T.E. Carver, J. Biol. Chem., 267 (1992) 22022. 128. T.E. Carver, R.J. Rohlfs, J.S. Olson, Q.H. Gibson, R.S. Blackmore, B.A. Springer and S.G. Sligar, J. Biol. Chem., 265 (1990) 20007. 129. T.E. Carver, R.E. Brantley, E.W. Singleton, R.M. Arduini, M.L. Quillin, G.N. Phillips and J.S. Olson, J. Biol. Chem., 267 (1992) 14443. 130. H. Li, R. Elber and J.E. Straub, J. Biol. Chem., 268 (1993) 17908. 131. K.D. Egeberg, B.A. Springer, S.G. Sligar, T.E. Carver, R.J. Rohlfs and J.S. Olson, J. Biol. Chem., 265 (1990) 11788.
94
132. D.A. Case and M. Karplus, J. Mol. Biol., 132 (1979) 343. 133. K.A. Johnson, J.S. Olson and G.N. Phillips, J. Mol. Biol., 207 (1989) 459. 134. M.L. Carlson, R. Regan and Q.H. Gibson, Biochemistry, 35 (1996) 1125. 135. L. Landau, Z. Phys. Sov., 2 (1932) 46. 136. O. Schaad, H.X. Zhou, A. Szabo, W.A. Eaton and E.R. Henry, Proc. Natl. Acad. Sci. U.S.A., 90 (1993) 9547. 137. T.E. Carver, J.S. Olson, S.J. Swerdon, S. Krzywda, A.J. Wilkinson, Q.H. Gibson, R.S. Blackmore, J.D. Ropp and S. G. Sligar, Biochemistry, 30 (1991) 4697. 138. J.R. Lakowitz and G. weber, Biochemistry, 12 (1983)4171. 139. M.R. Eftink and C.A. Ghiron, Anal. Biochem., 114 (1987) 199. 140. S.E. Englander and N.R. Kallenbach, Quart. Rev. Biophys., 16 (1983) 521. 141. R.F. Tilton Jr., U.C. Singh, S.J. Weiner, M.L. Connolly, I.D. Kuntz Jr., P.A. Kollman, N. Max and D.A. Case, J. Mol. Biol., 192 (1986) 443. 142. R.F. Tilton Jr., U.C. Singh, I.D. Kuntz Jr. and P.A. Kollman, J. Mol. Biol., 199 (1988) 195. 143. R.B. Gerber, V. Buch and M.A. Ratner, J. Chem. Phys., 77 (1982) 3022. 144 R. Czerminski and R. Elber, Proteins: Struct. Funct. Gen., 10 (1991) 70. 145 W. Nowak, R. Czerminski and R. Elber, J. Am. Chem. Soc., 113 (1991) 5627. 146 R. Czerminski and R. Elber, Int. J. Quant. Chem., 24 (1990) 167. 147 G. Verkhiver, R. Elber and Q.H. Gibson, J. Am. Chem. Soc., 114 (1992) 7866. 148 E. E. Scott and Q. H. Gibson, Biochemistry, 36 (1997) 11909. 149 M Brunori, F. Cutruzzola, C. Savino, C. Travaglini-Allocatelli, B. Vallone and Q.H. Gibson, Biophys. J., 76 (1999) 1259.
L.A. Eriksson (Editor) Theoretical Biochemistry - Processes and Properties of Biological Systems Theoretical and Computational Chemistry, Vol. 9 9 2001 Elsevier Science B.V. All rights reserved
95
Chapter 3 M e c h a n i s m s for E n z y m a t i c Reactions Involving Form a t i o n or Cleavage of 0 - 0 Bonds Per E.M. Siegbahn and Margareta R.A. Blomberg Department of Physics, Stockholm University, Box 6730, S-113 85 Stockholm, Sweden
Theoretical studies of the important class of enzyme reactions where an O-O bond is either formed or cleaved are described. Photosystem II is the only enzyme that can form O-O bonds from water, and suggested mechanisms for how this might occur are discussed. In contrast, several enzymes are able to cleave O-O bonds. The main examples discussed here are cytochrome oxidase and methane monooxygenase. Other examples described are heine peroxidases, manganese catalase and isopenicillin N synthase. General features are discussed for these reactions, which are shown to usually involve spin-state changes. The appearance of radicals and critical roles of protonations are emphasized.
1. I N T R O D U C T I O N Formation and cleavage of 02 are two of the most fundamental processes in nature and are therefore central in biochemistry [1]. The first organisms that developed the ability to form 02 from water and sunlight were cyanobacteria which appeared more than two billion years ago. They used water as an unlimited source of protons and electrons in their metabolism and released 02 as a waste product. This led to a rather fast increase of 02 in the atmosphere from small amounts to the present 2 1 % level. Initially, the large amount of 02 was disastrously toxic for the existing organisms. However, soon organisms developed that could make use of 02 to significantly increase the efficiency of ATP production, as well as using it for different efficient oxidation processes. Anaerobic glycolosis leads to the overall reaction (1), C6H1206 + 2ADP +2P~ - - ~ 2Lactate + 2H + + 2H20 + 2ATP
(1)
while aerobic metabolism of glucose leads to (2), C6H1206 + 38ADP +38Pi +602 ~
6CO2 + 44H20 + 38ATP
(2)
96 There is thus a 19-fold increase of the efficiency of ATP production when 02 is used. The reason for this increase is that 02, unlike other abundant substances, is thermodynamically relatively unstable. The double-bond in 02 has the strength of only 118 kcal/mol, as compared to the sum of the two O-H bond strengths in water of 219 kcal/mol, for example. Therefore, the reaction between 02 and a large number of substances is strongly exothermic. Still, 02 is kinetically quite stable, partly because of its triplet ground state. The by far main biological use of 02 is in respiration and only a minor fraction is used for oxidizing different substrates. The area of high accuracy quantum chemical applications on biological systems is relatively new. One obvious reason is that models of biological processes in general have to contain a rather large number of atoms. In the case of reactions involving 02 there is also the additional reason that these processes usually need to be catalyzed by transition metal complexes, and transition metal systems have been regarded as quite difficult systems to treat by theoretical methods. The major problem of treating reactions involving transition metals accurately is that these are usually associated with large changes of both the dynamical and non-dynamical correlation energies [2]. A decade ago, the high accuracy required to treat transition metal reactions demanded the use of the most advanced ab initio methods, but the use of these was too time-consuming even for small models of biological systems. At the same time conventional Density Functional Theory (DFT) methods were not quite accurate enough. A major change of this situation occurred a few years ago when terms depending on the gradient of the density, in particular for the exchange interaction, were introduced into DFT [3, 4]. This improvement, together with the improvement obtained by introducing a few semi-empirical parameters and a fraction of the Hartree-Fock exchange, has led to an accuracy that is not far away from that obtained by the most accurate ab initio methods at a small fraction of the cost [5]. In the present review, reactions occurring in enzymes involving either formation of 02 or cleavage of O-O bonds will be discussed. Since the field is so new, only a few of these reactions have been studied until now, but these will be described in relatively high detail. Formation of 02 from water is performed in nature by only one system, Photosystem II, containing a tetramanganese cluster. Since there is not yet any X-ray structure available, the character of a theoretical study of PSII is quite different from the other examples discussed here. The main case of O-
97
O bond cleavage discussed is the one that occurs in cytochrome oxidase, which is the terminal enzyme in the respiratory chain. In this case the roles of the protons and electrons involved are critical and this will therefore be discussed in detail. Other examples will be the O-O bond cleavages in heme peroxidases, in methane monooxygenase (and ribonucleotide reductase), in manganese catalase and isopenicillin N synthase. A general feature of these reactions that make them different from most other reactions is that a change of potential surface, usually leading to a change of spin, is required. The general principles involved in these processes will be emphasized. For theoretical studies involving also other transition metal enzymes and other reactions, see recent reviews [6, 7, 8].
2. M E T H O D S
AND
MODELS
All studies discussed in this review have used the B3LYP method [5, 9], which is termed a hybrid DFT method since it uses Hartree-Fock exchange in addition to the normal density functionals. The B3LYP functional can be written as, F B 3 L Y P = ( I _ A ) , F~Slat,~ +A ,FHF+B , F~B,~k, +C , F~LYP + ( 1 - C ) F yWg (3) where F~ later is the Slater exchange, F HE is the Hartree-Fock exchange, F~e-k* is the gradient part of the exchange functional of Becke [3], F LYP is the correlation functional of Lee, Yang and Parr [10] and F y W g is the correlation functional of Vosko, Wilk and Nusair [11]. The A, B and C coefficients were determined [5] using a fit to experimental heats of formation, where the correlation functionals of Perdew and Wang [12] were used instead of F v W g and F LYP in the expression above --C
--C
~
The calculations described here were performed in two steps. For each structure considered a full geometry optimization was performed using the B3LYP method and standard double zeta basis sets, which for the metals imply the use of non-relativistic effective core potentials (ECP's). For the B3LYP Hessian calculations the same basis set was used. In the second step, the B3LYP energy was evaluated for the optimized geometries using larger basis sets including diffuse functions and a single set of polarization functions on each atom. All calculations were performed with GAUSSIAN94 [13] or GAUSSIAN-98 [14]o The accuracy of different DFT methods has been tested on the standard G2 benchmark test consisting of the enthalpies of formation of 148
98
small first and second row molecules [15]. These comparisons show that the B3LYP method is clearly superior to the other DFT methods with an average deviation from experiments of only 3.11 kcal/mol [15]. This can be compared to the corresponding results of 1.58 and 0.94 kcal/mol, respectively, for the G2 [15] and G3 [16] methods, which are among the most accurate ab initio methods available. For the geometries of a 55 atom subset of the G2 benchmark test, all DFT methods give quite accurate results, perhaps slightly more accurate for the hybrid methods [17]. It is also worth noting that the geometry convergence with basis set is very fast. Due to the lack of accurate experimental values, much less is known about the accuracy of DFT methods for transition metal complexes. The few systematic theoretical studies that have been performed were recently discussed in a review [7]. For small cationic systems, the average absolute error in calculated M-R bond energies, where M is a first row transition metal and R is H, CH3, CH2 or OH, were found to be in the range 35 kcal/mol using B3LYP. For the successive M-CO bond energies in first transition row metal carbonyls, the average error was only 3 kcal/mol, and the results were in most cases within the experimental error bars. A comparison of particular interest for the present review also exists for the case of the O-H bond strength in MnO3(O-H)- [18], where the B3LYP result was found to be in good agreement with experiment. This system is similar to the model systems discussed below for PSII. When studying biochemical problems it may be important to consider also the modeling of the part of the enzyme that surrounds the part treated quantum mechanically. For the present type of transition metal complexes, it has generally been found that effects coming from outside the metal complex are quite small. These are therefore reasonably well treated by simple dielectric cavity methods. In some examples discussed here, the polarized continuum model of Tomasi et al [19] was used, but in most cases the self-consistent isodensity PCM (SCI-PCM) of Wiberg et al [20] as implemented in the GAUSSIAN-94 [13] program was used. In this method the solute cavity is determined self-consistently. The dielectric constant of the protein is the main empirical parameter of these models and in the studies discussed below it was normally chosen to be equal to 4 in line with previous suggestions for proteins. This value corresponds to a dielectric constant of about 3 for the protein itself and of 80 for the water medium surrounding the protein. A major question in the modeling discussed below for metal enzymes
99 concerns the charge state to be used of the active site complex. This question was discussed in connection to several examples of different modelings in a recent review [6]. It was concluded that the use of neutral models is in general the preferred procedure for metal complexes in the low-dielectric of proteins. The view that these metal complexes are best considered neutral is also common based on experimental experience [21]. Iron dimer complexes, which will be discussed in detail below, are illustrative examples of common situations in enzymes. For methane monooxygenase (MMO) there are very good experimental indications that the iron dimer complexes involved are all neutral. For example, normal charge counting on the reduced Fe2(II,II) complex with four carboxylates, two imidazoles and one water ligand leads to a neutral complex. The same is true for the strongly related Fe2(III,III) oxidized complex of ribonucleotide reductase (RNR) [22], where apart from the above mentioned ligands for reduced MMO, there are also a #-oxo bridge and another water ligand. Recently, X-ray structures of RNR without the iron centers show that even then the regions of the metal complexes for several different mutants are still neutral [23].
3. F O R M A T I O N
OF 02
Only one system in nature is capable of forming an O-O bond from water using visible light and this is Photosystem II found in green plants, algae and cyanobacteria. The overall reaction is given by (4), 2H20 + 4hu ---+ 4H + + 4e- + O2
(4)
where the photon wavelength is 680 nm. All attempts to reproduce the chemistry in reaction (4) by laboratory model compounds have so far been unsuccessful. In PSII, the formation of O2 is catalyzed by the wateroxidizing complex (WOC). No X-ray crystallographic study of the WOC does yet exist, and essentially all structural information about the complex is therefore derived from EXAFS and EPR studies. It is known that each PSII contains four manganese atoms, one calcium atom and one chlorine atom. Manganese and calcium are essential for O2 formation. Calcium can only be replaced by strontium [24], while the chloride can be replaced by a variety of ions and can even be removed entirely without totally suppressing the activity [25]. Several ideas about the structure of the WOC exist. The leading suggestion based on EXAFS has been a complex with two loosely coupled bis-p-oxo Mn-dimers [26], but a more tightly coupled complex is not ruled out. EPR seems to generally favor tight complexes
100
[27, 28], but other interpretations also exist [29]. Strontium EXAFS has been interpreted to show two short distances for strontium to manganese implying the same for calcium in the actual complex [30]. Apart from the direct structural information about the WOC, there are also several other pieces of critical information, on which a model of 02 formation can be built. 02 is thus found to be formed in four steps, each one involving adsorption of a photon leading to a charge separation in the reaction center [31]. These steps define the so called S-states of the WOC, where the system starts in So and goes up to $4 where 09 is evolved before it returns to the next cycle. The resting state of the enzyme is $1 in which the WOC is EPR inactive and mostly interpreted to be in an Mn4(III,III,IV,IV) state [26]. One of the most important findings for the mechanism of 02 formation is that a neutral tyrosyl, Tyrz, radical is formed in the beginning of each S-state following reduction of P680 + in the reaction center [32]. In each S-state, the Tyrz radical then becomes rereduced forming a neutral tyrosine, simultaneously with oxidation of the WOC. Independently of the mechanism of this process, which is currently under debate, this is one of the most important experimental findings for the mechanism, since it gives a direct energetic criterion for the oxidation chemistry. It means that the energy available to the water oxidizing complex in each step is approximately equal to the bond strength of the TyrzO-H bond, which is equal to 86.5 kcal/mol. This energy amount can be modified, but only slightly, by changes of the charge of the cluster and changes in hydrogen bonding occurring during the S-state transitions. Two leading models for recreation of Tyrz exists. In the first model by Babcock et al [32] termed the hydrogen abstraction mechanism, see Figure la, the tyrosyl radical obtains both the proton and the electron from the manganese complex in a concerted hydrogen atom transfer step. This requires that the O-H bond strengths of tyrosine and water coordinated to manganese are about the same. In the second model termed the electron transfer model, see Figure lb, the tyrosyl radical obtains the electron from the manganese complex and the proton from a nearby base. In that model, the water molecules which will eventually form 02, will lose their protons to a different base. This model has recently been elaborated further by Junge et al [33] and some new aspects and modifications of the proton translocation mechanism have been introduced. In the first B3LYP study of possible PSII mechanisms, the energetic fea-
101
(
9.'
a.
Tyrz 9
ss,'
Base
.,*"
+
H+
Mn4 H
Proton channel
H
Tyrz"
bo
,(k.
, e"
" ~
"W.
"-.N~-~
Base
"H§ "'r
H+ N H
Figure 1" Schematic picture of the hydrogen abstraction scheme (a) and the electron transfer scheme (b). sibility of the hydrogen abstraction model was tested [34]. Both monomeric and dimeric manganese model systems were studied, but only 5-coordinated complexes. It was found that, by coordination to a manganese cent~er, the first O-H bond strength of water is lowered to a value 0.2 kcal/mol lower than that in tyrosine. The second hydrogen abstraction energy was quite similar. Since thermoneutrality in the reaction (or a weak exothermicity) is a requirement for the hydrogen abstraction model, these calculations are in accord with this model. It should be added that the results are not inconsistent with the electron transfer model either. Later studies using 6-coordinated complexes, have given a somewhat different picture. For all these systems tried, the energy to form terminal M n - O oxo bonds has been found to be too high (by >10 kcal/mol) in comparison with the TyrO-H
102
bond strength [35, 36, 37]. Several B3LYP studies have been performed to study the mechanism by which the WOC forms the O-O bond. In the initial study [38], already prepared terminal M n ( V ) = O oxo bonds were approached to each other in order to make an O-O bond. In the same study a terminal M n ( V ) = O oxo bond was also moved towards a terminal Mn-OH hydroxo bond to form an Mn-OOH ligand. In other unpublished work, attempts to form an 02 molecule from two bridging #-oxo oxygens were also tried. Models with 5- and 6-coordinated manganese centers were investigated. For all these model reactions, very high barriers (above 25 kcal/mol) were obtained, in contrast to the barrier of about 10 kcal/mol found experimentally for PSII. There is one reason in common for the high barriers in all cases tried and this is the difficulty to reach a point where an oxyl group (oxygen radical) is formed. It was always found that very early in the reactions the M n = O oxo-bond was promoted to an Mn-O. oxyl-group which for the model complexes tried did cost too much energy. Later studies on O-O bond formation were therefore focused on the problem of creating oxyl radicals at a sufficiently low energy cost. After extensive B3LYP investigations an oxyl radical mechanism for OO bond formation in PSII was formulated [39]. The suggested mechanism includes several new idea~, which were proposed and tested. First, general spin state considerations were shown to lead to the conclusion that formation of 02 most probably will require preformation of an oxyl radical, in line with the experience obtained in the initial search for possible transition states described above. The reasoning was as follows. In a typical weak ligand field redox reaction, in which at least one metal atom changes oxidation state, this will lead to a change of ground state spin. The position of the excited states before and after the reaction are therefore critical. For a
low barrier reaction, either the excited state of the reactant corresponding to the product ground state (the high-spin state), or the excited state of the product corresponding to the reactant ground state (the low-spin state}, has to be low lying. In the case of water oxidation the reactant excited state (before O-O bond formation) is expected to be an oxyl radical since the ground state has a rather weak r-bond to the oxo ligand. This oxyl radical should be very reactive and ideal for formation of the O-O bond. The product excited state, on the other hand, is just a recoupling of the d-shell, which should not help O-O bond formation. This leads to the conclusion that it is the excited state for the reactant that has to be low
103
lying. All model calculations also point in the same direction. In fact, for a sufficiently low barrier, the oxygen radical state of the reactant has to be prepared prior to the step where the O-O bond is formed, which is after the $3 step. No oxidation of manganese should therefore occur going from $2 to $3. In subsequent studies it has furthermore been shown that the oxyl radical appears also on the low spin-state of the reactant, which means that the creation of the oxyl radical can not be avoided, either way the reaction occurs. It can also be added that these spin-state arguments do not change when a complex of several antiferromagnetically coupled manganese centers is involved, simply because the antiferromagnetic coupling is so weak for the relevant type of complexes. As an example of the size of the effects, it has been shown that the O-H bond strength of the hydroxyl ligand in the Mn2(IV,IV)-OH dimer is only 0.5 kcal/mol stronger for the antiferromagnetic than for the ferromagnetic coupling case [34, 37]. The total O-H bond strength was found to be 85.0 kcal/mol with an estimated error of 4 kcal/mol. So
M,,rm)
Sl
~~'d~') ",~l
..
,(m)
-h,
.
:( )
.N
0\1
.OH,
M~a~;,
H+e -
$3
J
$2
o
OH: I / o. 9
.OH
.o/-o , ox/~ , d-"
O~Jn.../o.
/!,
x/~
Figure 2: Proposed sequence of the S-states from So to $3 for oxygen radical formation in PSII. Protons removed are marked with *.
104
3.28 ;=2.83
% % %
Figure 3: Optimized Mn3-model structure for the $3 oxygen radical state.
Built on the previous study where a complex with only one manganese and one calcium center were used [39], an oxyl-radical mechanism has recently been suggested based on more realistic model complexes [37]. These complexes were constructed based on available experimental information mainly from EXAFS and EPR, see above, and contain three tightly bound manganese centra and a calcium center with two short distances to manganese. No position could as yet be suggested for the fourth, less tightly bound manganese center, and this center was therefore left out of the model. The type of model complex used can be seen in Figure 2 where a tentative position of the fourth manganese has also been indicated. This figure shows the suggested sequence of S-states that resulted from the model study. The model complex contains a central cube with an empty corner, and it is suggested that the essential chemistry occurs in this cube. In the $1 state, the corners are formed from two Mn, one Ca, two #-oxo and two waters. Water-oxidation is suggested to occur by removing protons from the two waters and from a/z-hydroxo group. All computed O-H bond strengths fulfill the requirement that they are close to the one of tyrosine, which is the most demanding requirement for the water oxidation chemistry. Calcium has an important chelating role in these processes and makes the O-H bonds sufficiently weak for the abstraction chemistry by the tyrosyl radical. In the $3 state the oxyl radical, required for O-O bond
105
formation, is located in a bridging position in the lower left corner of the cube in Figure 2. This assignment and the suggestion that manganese is not oxidized in the $2 to $3 transition is in line with previous suggestions based on EXAFS, XANES [26], E P R [40], and NMR [41] experiments, but differs from other suggestions based on other XANES experiments [42, 43]. The optimized S3-state structure for the Mn3 model complex is shown in Figure 3. Spins larger than 0.10 are marked in the figure and it can be seen that these are strongly localized to four centers, the three manganese centers and the oxyl radical center which has a spin population of 0.90.
o
O"" 9
I
"OH
~,
~ ML(~'~) (~l
HO
~ Mn(IV)
/Mn(IV)
C~l
/\
So
\ /
"<'~
]/OH k_./ /Mn(IV)
$4
"
I
I/
- ' ~ o ..... I~
H+'e"
o
~ HO.
I
[/
[ .,.O Mn(III)
HO
Figure 4: One suggested possibility for 02 formation during the $3 to So transition. For the Mn3 model complex, no transition state geometry for O-O bond formation has as yet been obtained. However, a suggested mechanism is shown in Figure 4. In this mechanism, an external water molecule enters the originally empty corner of the cube. This water loses a hydrogen atom to the oxyl radical, and then forms an O-O bond to an hydroxyl group bridging manganese and calcium. Simultaneously, a proton moves to a bridging hydroxyl group in another corner of the cube. At this point, the
106
3.00 1.90
2.35~~( "
~
2.40
Oa
1.086:
1.40',
i
1.91
2.40 2.36
I
17 1.54"
d
s--.40
Figure 5" Transition state structure for O-O bond formation in a simple model of the water-oxidizing cluster in photosystem II.
final H+,e - transfer to the tyrosyl radical occurs, a water molecule enters and 02 is released. This mechanism is consistent with recent important solvent exchange experiments by Messinger et al [44] who used labeled oxygens. In these experiments the O-O bond was shown to be formed from one oxygen that is fastly exchanging with solvent water and one that is more slowly exchanging. The fastly exchanging oxygen should come from the external water and the slowly exchanging oxygen from the bridging hydroxyl group. The computed binding energy of 10.6 kcal/mol for the external water in the S3-sgate is found to be very close to the one of 12 kcal/mol estimated from the same experiments. In the original study of the oxyl-radical mechanism for 02 formation, an approximate transition state structure was reported [39]. This structure was obtained by freezing the O-O coordinate at different values and look for the highest energy point. Other degrees of freedom were also investigated in a similar fashion. Recently, a fully optimized structure was obtained using a computed Hessian with an imaginary frequency corresponding to
107
O-O bond formation. This structure is shown in Figure 5. For this model a terminal oxyl radical is formed in $3 and the O-O bond is formed to an external water. As seen on the transition state structure, this O-O bond formation occurs simultaneously with both a proton transfer from the external water to a terminal hydroxide and an electron transfer from oxygen to manganese. The electron transfer character of the transition state is seen on the spins on oxygen and manganese which are halfway in between those of the reactants (Mn-spin=3.0) and products (Mn-spin=4.0). The spins on the oxygens are mostly located on the outer oxygen. Overall, this type of transition state has large similarities to the ones found for all O-O bond cleavage processes discussed here, see further below. To find a corresponding transition state structure for the more realistic Mn3-model has turned out to be very difficult and work is still in progress.
4. O - O B O N D
CLEAVAGE
4.1. O-O bond activation in c y t o c h r o m e oxidase Cytochrome oxidase is the terminal enzyme in the respiratory chain~ located in the mitochondrial or the bacterial membrane in all aerobic organisms. The driving force of the respiratory electron transfer is the reduction of molecular oxygen to water, which occurs in cytochrome oxidase. The exergonic reduction of O2 is coupled to proton translocation across the membrane, resulting in a proton gradient, which is used to produce ATP. These respiratory processes are very efficient, and it is estimated that about 80 % of the 02 reduction energy is actually stored into ATP. The cytochrome oxidase enzyme has four metal centers, two copper centers, labeled CUA and CuB, and two heme iron centers, labeled heme a and heme a3. Two of these metal centers, CUB and heme a3, located only about 5 /~ apart (metal to metal distance) and therefore referred to as the binuclear center, constitute the active site for the 02 activation process. The X-ray structure of both a mammalian [45] and a bacterial [46] cytochrome oxidase has been determined and the structures around the binuclear center are found to be very similar for the two species. In Figure 6 the binuclear center of the bovine heart [45] enzyme is shown. Although the experimental information on the cytochrome oxidase processes is very rich, the molecular details of the O2 reduction process and the coupling to proton translocation are not very well understood. A possible cycle for the 02 reduction process~
108
~
~ ~ (~ r~ ~ His290
His291 ( ~ ~ r
x--~uB
His240 ~L_
~'/ -
Tyr244
~
Hemea3 H i s 3 7 6 ~ ~ ~
2.7~",
~, ~) Farnesyl
Figure 6: X-ray structure of cytochrome oxidase binuclear center from bovine heart. 02 + 4H + + 4e- --+ 2H20
(5)
is shown in Figure 7. In this cycle six different intermediates, which have been suggested on the basis of different experimental data, are included, and it should be emphasised that the exact molecular structures of these intermediates are not known. In particular, compound P in Figure 7 was originally suggested to have an Fe-O-O-H peroxidase structure [47, 48, 49], while newer experimental data [50, 51, 52, 53, 54] indicate that the O-O bond is already cleaved in P. One of the main motivations for performing quantum chemical calculations on the O-O bond cleavage processes in cytochrome oxidase was to resolve the question whether it is possible or not to cleave the O-O bond before the so called 3:rd electron (see Figure 7) enters the binuclear center [55, 56]. To decide upon the model for the active site in cytochrome oxidase is unusually difficult. Apart from the ordinary questions about the molecular size of the model, there are other questions concerned with the flow of
109
-H20 H20--Cu(I)~TyrOH
HO-Cu(II)-TyrO 9
n+ ----/%~)~/~ H
t
TS
H20 O II -'---Fe(W) ~ /
+H +
H+ ~O, H 3:rd e"
§
H20--Cu(II)--TyrOH O II .O, m'-'Fe(W)~" H +H + 4:th e"
_H20 +02 +
H20--Cu(I)~TyrOH
+H+
H20--Cu(I)~TyrOH
+H+
H20--Cu(II)-TyrOH
+
9H2
H+
OH2 2:nd e-
R
l:st eE
HO I
a
---Fe(III)~" "H O
Figure 7" Possible catalytic cycle for 02 reduction in cytochrome oxidase.
electrons and protons involved in the reduction process. In fact, the determination of the mechanism for O-O bond cleavage in cytochrome oxidase, is to a large extent equivalent to deciding how many electrons and how many protons that are involved in each step of the reaction. Furthermore, the energetic criteria to be used in the evaluation of different mechanisms have to be somewhat different from those that are normally used in quantum chemical studies. For example, since the number of protons and electrons involved in each step differ it is not possible to use the absolute energy to determine which is the most likely reactant or product structure. Instead, the reaction energies are the only energies that can be used in the comparison between different mechanisms. As mentioned above, the enzyme is very efficient in storing the released energy as ATP. This means that very little energy is wasted as heat, which in turn means that all the reaction steps involved have to be close to thermoneutral. Furthermore~ from the experimental reaction rates it is also clear that no reaction step can have a high barrier. The activation enthalpy for the O-O cleavage step is in fact measured to be only 6.4 kcal/mol [57]. This energetic information should be used in the search for likely reaction mechanisms. To model the binuclear center shown in Figure 6, all histidines are replaced by imidazoles, the tyrosine is replaced by a phenol and the heme group is replaced by an unsubstituted iron porphyrin. Models of this size can be used in calculations on certain intermediates, where the copper and
110
iron complexes can be treated separately. However, to determine transition states, and also the relative energy of most intermediates, a combined model of the binuclear center has to be used. Then all imidazoles except the one covalently cross-linked to the tyrosine are replaced by ammonia, and the porphyrin ring is replaced by two chelating diformamidate (NHCHNH-) ligands. Most calculations in the study were thus performed on models consisting of 50-55 atoms, which is a practical maximum, considering the number of different structures that has to be treated. The most important results from this quantum chemical study [55, 56] will be summarized below. His
Cu(I)" I /
His/
HiS\Ty rOH
~OH 02 t ,
~Fe(II)
I
His
o II
~Fe(IV)~
i
His
Figure 8: Reaction scheme for overall 02 bond cleavage in cytochrome oxidase.
The 02 molecule is coordinated to the reduced Fe(II), Cu(I) form of the enzyme, labeled R in Figure 7, thus forming compound A. As mentioned above, the O-O bond has recently been suggested to be cleaved already at this so called two electron level of the enzyme. To achieve the O-O bond cleavage, formally four electrons are needed to be transferred from the binuclear center to the 02 molecule. One of the electrons can be taken from copper forming Cu(II), and two electrons can at this point be taken from iron, forming an Fe(IV)=O species. Different sources are potentially possible for the fourth electron, e.g. copper which would lead to a Cu(III) species, but this is rather unlikely, or the porphyrin ring of heme a3, but this is in contrast to spectroscopic data that show no signs of a porphyrin 7rradical in compound P. Instead it was recently suggested that the tyrosine which is covalently crosslinked to one of the copper ligated histidines could be the source of the fourth electron [54], which would yield a tyrosyl radical in the bond cleaved product P, as indicated in Figure 7 and Figure 8. Through the histidine crosslink the tyrosine residue is perferctly located, close enough to the binuclear center, to possibly deliver both a proton and an electron to cleave the 02 molecule. It was therefore suggested that a hydrogen atom transfer from tyrosine to a bridging Fe-O-O-Cu structure
111
could occur concerted with O-O bond cleavage [54]. The overall reaction energy for such a mechanism for the 02 bond cleavage, as given by the reaction scheme in Figure 8, is in preliminary calculations, treating the iron and copper complexes separately, found to be very close to zero, i.e. the thermoneutrality criterion discussed above seems to be fulfilled for this mechanism. In order to investigate the second criterion for a likely O-O cleavage mechanism, a transition state structure that has a low enough activation energy has to be located. To straightforwardly follow the suggested bridging Fe-O-O-Cu bond cleavage mechanism turned out to give too high a barrier, more than 25 kcal/mol, and a very large number of calculations were performed to investigate different possibilities to lower the bond breaking activation energy. Eventually a mechanism, described below, that agrees well with experimental data was found. Two factors were found to be important for a low O-O bond cleavage barrier. First, a water molecule has to be present at the binuclear center and second, one of the residues in the vicinity of the crosslinked tyrosine needs to be protonated. His
His
I/%
....... Cu(I) H "O~,
"0
m~,~w
I I
Fe(IIl).~---
His
His\[ TyrOH
/His\
Ho/CU(II)
TyrOH
H\ O~ ~-O
I His
Figure 9: Reaction scheme for initial water cleavage in cytochrome oxidase.
Spectroscopic data indicate that a water molecule is present at the binuclear center, near CUB [58]. In the investigation of the originally proposed bridging Fe-O-O-Cu bond breaking mechanism a water molecule was introduced simply to make the phenyl hydroxyl group of the cross-linked tyrosine reach the bridging 02 molecule. As mentioned above the bridging Fe-O-O-Cu mechanism gave a too high barrier and it was therefore abandoned. An alternative way to use a water molecule in the initial phase of the reaction is to let the water molecule deliver a proton to the O2 molecule. The calculations show that a protonation of the O~ molecule at this stage initiates an electron transfer from CuB to the heme a3 complex, and the products of such a reaction, which is shown in the reaction scheme
112
Scu..compl:0.65 1.97 !
!
1.29
6
,' 1.14 1.45
So2:0.35
1.91
SFe: 1.00 2.02 I
Figure 10- Optimized transition state structure for the initial water cleavage in cytochrome oxidase.
in Figure 9, are thus a non-bridging Fe-O-OH peroxide and a Cu(II)-OH complex. The reaction in Figure 9 is calculated to be exothermic by 1.9 kcal/mol. The transition state for this water splitting reaction was determined, and the structure obtained is shown in Figure 10. The calculated activation enthalpy is 8.4 kcal/mol, which is in good agreement with the experimentally determined value of 6.4 kcal/mol [57]. The calculations further show that there is an unusually large entropy effect on this water splitting reaction of about 6 kcal/mol, as will be further discussed below. It is evident from Figure 10 that this water splitting process can be described as a typical hydrogen atom transfer reaction, since in the transition state the spin population is halfway between the reactant and the product spin populations, and the proton is on its way from the copper-ligated water to the iron-coordinated 02. The mechanism for O-O cleavage that is initiated by the splitting of
113
a water molecule is labeled the water assisted mechanism, and the next reaction step to be discussed is the actual O-O bond breaking step for this mechanism. Starting from the Fe(III)-O-O-H peroxide structure formed by the water splitting described above, the O-O bond cleavage reaction yields an oxo-ferryl and a tyrosyl radical, as shown in the reaction scheme in Figure 11. This reaction is calculated to be exothermic by 6.4 kcal/mol. An approximate transition state for the O-O cleavage reaction step of Figure 11 was determined and the structure is shown in Figure 12. The corresponding activation energy, however, is calculated to be at least 20 kcal/mol, which is too high compared to the experimental O-O activation energy of 6.4 kcal/mol [57]. Such a reaction step is thus ruled out. His
ms,,[
HO,/Cu(II) S
\
0/o" I
~Fe(l/I)
I
His
..ms\
His
His'~,Cu~( i I ( His- Tyr , Tyr
..H
HO/
s
/~
O.
H\o/I-f O II ~Fe(IV)~
I
His
Figure 11" Reaction scheme for water assisted O-O bond cleavage in cytochrome oxidase.
Experimental data indicate that there could be another proton available at the binuclear center in the reduced form of the enzyme (R in Figure 7) [59]. Such a proton could enter the binuclear center via the so called Kchannel, which ends at the hydroxyl group of the heme a3 farnesyl side chain. This farnesyl hydroxyl group in turn is hydrogen bonded to the hydroxyl group of the crosslinked tyrosine, as indicated by the crystal structure (Figure 6). An extra proton in the vicinity of the tyrosyl hydroxyl group may very well facilitate the hydrogen atom transfer from the tyrosine to the Fe-O-OH peroxide occurring in the O-O bond cleavage reaction step~ and could thereby decrease the activation energy needed for this reaction step. The protonation in the vicinity of tyrosine is modeled by a H30 + as shown in the reaction scheme in Figure 13, and this O-O bond cleaving mechanism is labeled the water and proton assisted mechanism. The reaction energy of the scheme in Figure 13 is the most difficult one to determine, mainly because the model used has certain problems to describe the product of the reaction as will be discussed further below. Therefore the reaction energy of the scheme in Figure 13 is estimated to be the same
114
2.05 .--
"'-0
s
Figure 12- Approximate transition state structure for the water assisted 02 cleavage in cytochrome oxidase.
as that of the scheme in Figure 11, i.e. 6.4 kcal/mol exothermic. The optimized O-O bond cleavage transition state for this mechanism is shown in Figure 14, and it turns out that the extra proton has important effects in this region of the potential energy surface. The barrier is drastically decreased, from more than 20 kcal/mol without the proton to about 1 kcal/mol with the proton available. The entropy effects on this reaction step are small. The mechanistic effect of the extra proton is that in the transition state region it replaces the tyrosine hydroxyl proton when that proton is transferred to the distal oxygen of the Fe-O-OH group, as can be seen on the transition state structure in Figure 14. The O-O bond cleavage reaction results in a change of spin state on the iron complex. The Fe(III)O-OH reactant is a doublet and has only one unpaired spin, located on iron, while the Fe(IV)=O product is a triplet, which can be high spin or low spin coupled to the radical created in the reaction. The electronic structure at the transition state is indicated in the figure using the unpaired
115
spin populations. From these it can be seen that the creation of the product radical (suggested above to be located on the tyrosine) does not occur until after the barrier is passed, and instead, there is an internal redistribution of the electrons in the Fe-O-OH + unit, leading to an increased spin on iron, from 1.0 in the reactant to 1.30 in the transition state structure. The increased spin on iron is essentially balanced by the emerging spin populations on the two oxygens of-0.11 and -0.16, respectively. His
His
I-Iis\ I ..ms\ HO/Cu(II) Tyr
P-
i-io /
H
n,,~
0)0"""
XO....-H!
I
n
Fe(m)
!
His
H
§ 0 II ~Fe(IV)~
H
I
His
Figure 13" Reaction scheme for water and proton assisted O-O bond cleavage in cytochrome oxidase.
The product of the water and proton assisted mechanism has been suggested to contain a tyrosyl radical as indicated in Figure 13. As mentioned above, the radical is created at a late stage of the O-O bond breaking mechanism, and it is therefore not directly involved in the bond-cleavage. In the model used to study the water and proton assisted mechanism, in fact, the product radical turns out to become located on the porphyrin ring, which is contrary to spectroscopic data for the P state of the enzyme. However, previous calculations [60] show that the ionization energy of tyrosine is significantly decreased by hydrogen bonding to a strong base, and the water molecule used in the model is a much weaker base than the farnesyl hydroxyl group which is directly connected to the delocalized 7r system of the porphyrin ring. Therefore, the product of the O-O bond cleavage can not be studied using the same simplified model as was used to study the activation barrier. Calculations on improved models for the product, including the full porphyrin and the hydrogen bonding between the farnesyl hydroxyl and the tyrosine, show that for such a model the radical shifts towards the tyrosine, yielding about 30 % of the radical on the tyrosine. These results indicate that the exact location of the product radical is the result of a delicate balance between the ionization potentials of different fragments of the binuclear center, and this balance is extremely difficult to
116
Scu..compl:1.00
"1.54 " ~ So:-0.16 So:-0.11 (~""1.82 ~1.75 SFe:1 . 3 0
", 1.62
%
2.05 |C
Figure 14: Optimized transition state structure for the water and proton assisted 02 cleavage in cytochrome oxidase. reproduce by the calculations. They furthermore indicate that the location of the radical is not of large energetic importance, since the different options are characterized by having very similar energies. Together with recent experimental data [61, 62] the energetic results of the calculations described above suggest that the cross-linked tyrosine is a likely site for the product radical in the 02 cleavage reaction. In summary, an energetically feasible reaction mechanism for O-O bond cleavage in cytochrome oxidase has been found. The requirements of this mechanism are that there is a water molecule available at the binuclear center and that one of the residues at the binuclear center is protonated. Both these requirements are in accordance with experimental observations [58, 59]. The calculations together with the X-ray structure suggest that
117
the hydroxyl group of the heme a3 farnesyl side chain plays a role in such a protonation of the binuclear center. The calculations furthermore show that with this mechanism it is possible to cleave the O-O bond with only the electrons of the reduced enzyme (R in Figure 7) available at the binuclear center. The energetic results can be summarized as a potential energy surface for the A, Fe(II)-O2, to P, Fe(IV)=O, transition, see Figure 15. As mentioned above it is found from the calculations that there are large entropy effects on the water splitting initial step. Therefore the free energy is used to construct the potential energy surface for the O-O cleavage reaction, which is different from the general procedure used in this review, where relative enthalpies are used in energetic comparisons. HO..Cu(II) TyrO
50"Cu(I) TyrOH Free energy
H- ..0. O
(kcaYmol) k
II
O.--O~ "H
Fe(II)
IV
Fe(lII)
H§
14.5 kcaYmol
15 m $
10-
6.1 kcaYmol $
_mmmm-m'm"J-
5.5 kcal/mol 0
A
0 kcal/mol H20.. "Cu(I) TyroH
-0.9 kcaYmol I
O.Cu(ii) TyrOH o.-OH
92 m
~(m)
HO..cu(II)TyrO.
]
o
v
Fe(IV) M,X]
Figure 15" Calculated potential energy surface for the complete 02 cleavage process in cytochrome oxidase.
The potential energy surface in Figure 15 shows that the initial water splitting step, with a calculated barrier of 14.5 kcal/mol, is the rate limiting step. This result is in quite good agreement with the experimental life time measurements indicating a free energy barrier of 12.4 kcal/mol. Furthermore, the activation enthalpy has been determined experimentally,
118 from the slope of the ln(k) versus 1/T line, to be 6.4 kcal/mol [57]. As mentioned above, the calculated activation enthalpy for the water splitting is 8.4 kcal/mol, again in very good agreement with the experimental value. These two comparisons show that the calculated entropy effect on the rate limiting step of 6.1 kcal/mol is in fortuitously good agreement with experiment. The experiments also show a kinetic isotope effect in D20 of 2.2 [57], and the corresponding calculated value is 3.9. Finally, time resolved resonance raman measurements show that compound P appears with the same rate as compound A disappears~ which indicates that there is no stable intermediate between A and P [54]. The potential energy surface in Figure 15, having the peroxide intermediate well above the reactant in energy, and with only a small or no barrier for the O-O cleavage step, is thus in good agreement with that observation also. The reaction energy for the overall transformation from compound A to compound P is in Figure 15 indicated to be exothermic by 0.9 kcal/mol. This value is obtained using the model of Figure 9 for the last step, and is not to be considered as very accurately calculated. However, it is in good agreement with the requirement that the reaction should be close to thermoneutral, with only a small driving force.
4.2. H e m e peroxidases Heme peroxidases constitute a class of enzymes which oxidize a variety of substrates using hydrogen peroxide. Some examples are cytochrome c peroxidase (CCP), horseradish peroxidase (HRP)and ascorbate peroxidase (APX). The resting Fe(III) form of the enzyme reacts with hydrogen peroxide to form the so called Compound I, (heme)+Fe(IV)=O, which reacts further to oxidize the substrate. The initial step of the peroxidase cycle is thus the cleavage of the hydrogen peroxide O-O bond: (heme)Fe(III) + 4- H202 --+ (heme)+Fe(IV)=O(compound I) 4- H20 (6) The axial (proximal) ligand of heine peroxidases is, just like in cytochrome oxidase described above, histidine, and in the distal pocket there is a highly conserved histidine. Site-directed mutagenesis on CCP and HRP has shown that the distal histidine plays an essential role in the formation of compound I. On the basis of these observations and the crystal structure of the enzyme a mechanism for compound I formation has been suggested [63], see Figure 16, where the O-O bond cleavage occurs as the final step~ between e and d. In a recent quantum chemical study the energetic fea-
119
sibility of such a mechanism for the hydrogen peroxidase activation was investigated [64, 65]. R
R ~NH
R
R
~NH
"N
t
"N
',,
I.-I
,
9
"b o.
oIO.
C" ~eo,) -~
C" F~o,) .~
a
b
I
o O. ~.
I
F~O,) c
o ~
~..
9 'HH
b"
IIF~Ov)
~..
d
Figure 16: Suggested reaction scheme for hydrogen peroxide activation in heine peroxidases. The models used to study compound I formation in heme peroxidases are similar to the ones used in the cytochrome oxidase study described above. Both the proximal and the distal histidines are modeled by imidazoles, and the heme porphyrin is replaced by two chelating (NH)(CH)3(NH) ligands. Thus~ the size of the heme model is in between the large and the small models used in the cytochrome oxidase study. In the first set of calculations the proximal imidazole, which is hydrogen bonded to an aspartate residue, was considered to be protonated, which leads to a positively charged model
(+1). Using such a positively charged model, a transition state for the O-O cleavage was determined. It turned out that the best approximation for the transition state was determined as the crossing point between two different spin-surfaces. The reactant is a doublet state with only one unpaired spin, located on the heme iron, while the product is a quartet (or doublet) state, with the Fe(IV)=O unit being a triplet state, which in turn is high-spin (or low-spin) coupled to the porphyrin radical in the product compound I. At the crossing point, which has an O-O distance of 2.1 A, the doublet state spin-population on iron has increased from 0.94 in the reactant to 1.59. The increased spin on iron is antiferromagnetically coupled to the emerging spin on the two oxygen atoms, which have spin populations of-0.30 and-0.35, respectively. At this point there is still very little spin on the porphyrin (0.08), showing that, just like in cytochrome oxidase, the product radical is formed only after the O-O bond cleavage region on the potential surface is passed. The barrier for the O-O cleavage step calculated in this way is
120
1.14 ~)"" 1.34 s
9
1
,9
Figure 17: Optimized transition state structure for the hydrogen peroxide O-O bond cleavage in heine peroxidase. found to be 10.4 kcal/mol. This value can be compared to an experimental activation energy of 6.5 kcal/mol, as calculated from rate measurements on compound I formation in CCP. To reach the reactant for the O-O bond cleaving step (e in Figure 16) a proton has to be transferred from the proximal peroxide oxygen to the histidine, which has to be moved into a position where it is hydrogen bonding to the distal oxygen. This process (a to c in Figure 16) is calculated to be endothermic by some five kcal/mol, which would lead to an even higher barrier for the formation of compound I, counted from the heme coordinated hydrogen peroxide. However, the energetics of such a process (a to e) is very difficult to calculate, since the changes in the hydrogen bonding situation shown in the reaction scheme
121 could very well be coupled to other changes in the hydrogen bonding, for example involving an arginine which is close to the active site, and which are not included in the model calculation. The conclusion from the calculations is therefore that the suggested mechanism for O-O bond cleavage in heme peroxidases is feasible from an energetic point of view. Preliminary calculations were also performed for a neutral model, using an imidazolate to model the proximal histidine, see Figure 17. The results obtained were rather similar to those for the positively charged mode] described above, although with some minor difference for the actual O-O bond cleavage. First, relative to structure c in Figure 16 the activation energy is somewhat lower for the neutral model. Second, for the neutral model the barrier is reached entirely on the doublet surface, i.e. the curve crossing with the product quartet state occurs after the transition state. Finally, the transition state for the neutral model has a somewhat shorter O-O bond distance at the transition state as compared to the charged model. In a recent extension of this study, a true transition state optimization was performed for the O-O bond cleavage process using the neutral model [65]. The transition state thus obtained is shown in Figure 17. One important conclusion from these latter calculations is that the approximate transition state, obtained by freezing the most important degree of freedom (the O-O distance) [64], is a good approximation of the fully optimized transition state [65].
4.3. 0 - 0
b o n d a c t i v a t i o n in m e t h a n e m o n o o x y g e n a s e
Methane monooxygenases (MMOs) are a group of enzymes which convert methane to methanol via a monooxygenase pathway in which the dioxygen molecule is activated [66, 67, 68]. The overall reaction is given by (2), CH4 + 02 -b NADH + H + - - + CH3OH + H20 + NAD +
(7)
The longest known MMOs are soluble proteins containing a binuclear iron active site. The MMO from Methylococcus capsuIatus (Bath) consists of three proteins, a hydroxylase (protein A), a reductase (protein C), and a regulatory protein, (protein B). The C component uses NADH as primary reductant that provides electrons necessary to reduce 02, the B component allows C to feed these electrons to A, and the A component interacts with 02 and carries out the substrate oxidation step. The X-ray structure of the diiron complex from Methylococcus capsulatus (Bath) in an Fe2(III,III)
122
Glu243
".
His147
OH
His246
, Glu209
H20
',
I
", OH
(~
Giu114 Glu144
Figure 18: X-ray structure of the iron dimer complex of methane monooxygenase from Methylococcus capsulatus (Bath).
state [69] is shown in Figure 18. Since there are four anionic carboxylates, two neutral histidines and the bridging groups are likely to be two hydroxides, the complex is neutral. The mechanism of MMO including the different states of the iron dimer complex has been reviewed several times [66, 67, 68]. The lowest oxidation state of the diiron complex is Fe2(II,II) which is a loosely bound, ferromagnetically coupled dimer with a long Fe-Fe distance. This complex, termed O, reacts with 02 to form another complex P, which is normally assigned to an Fe2(III,III) peroxide complex. One or more intermediates in between O and P have been postulated [70]. In the next step, the dioxygen bond is cleaved and an unprecedented Fe2(IV,IV) complex termed Q is formed. The oxidation state assignment was made based on MSssbauer spectroscopy [71]. Compound Q has been suggested to be the active oxidant that attacks methane.
123
The effect of solvent pH and deuteration on the transient kinetics of the key intermediates of the dioxygen activation process of MMO isolated from Methylosinus trichosporium (OB3b) has recently been studied by Lee and Lipscomb [70]. The O decay reaction was found to be pH-independent, while in contrast the P formation rate was found to decrease sharply with increasing pH to near zero at pH 8.6. The decay rate of P matched the formation rate of Q, and both rates decreased sharply with increasing pH to near zero at pH 8.6. The results were interpreted to suggest that one proton is transferred in both the formation of P and of Q. If these protons are transferred to the bound oxygen molecule, the data are consistent with a model in which water is formed concurrently with the formation of Q, suggesting a heterolytic O-O bond cleavage mechanism similar to the one in oxidases and peroxidases discussed above. A number of theoretical studies has been made on the mechanism of MMO, all of them using the B3LYP method. In the first DFT study of MMO a quite simple model of the iron dimer complex was used [72]. Instead of the actual glutamates and histidines, hydroxyl and water ligands were used. The number of hydroxyl ligands was chosen to give an overall neutral complex, see further section II, with the desired oxidation states. The study considered different intermediates from P to Q and suggested a mechanism for hydroxylation. Another B3LYP study by Basch et al [73] used a similar size model but had amino groups modeling the histidines and also had two bridging carboxylates. A different type of model was used by Yoshizawa et al [74], where an unsaturated iron dimer complex carried a net positive charge. These first B3LYP studies did not specifically study the 02 activation step but mainly considered the hydroxylation step, which is outside the scope of the present review. The cleavage of 02 has only been considered in the most recent B3LYP study [75]. In this study, a structure with an asymmetric molecular O2 minimum was first located. Starting out from this minimum and stretching the O-O bond in steps while optimizing all other degrees of freedom, a large number of unsuccessful attempts were made to locate a reasonably low transition state. Starting from the other end of the O-O cleavage reaction, a possible structure for compound Q was first located. This structure was taken to be the one that gave the shortest Fe-Fe distance in order to reproduce the short distance measured by EXAFS [76], and had two bridging carboxylates apart from the two bis-#-oxo bonds, see further below. With this structure as a start, the oxygens were moved together in
124
His147
Glu243
s=.38
II ~"~.
His246
s=l.71~ Fe ~-
2.63
s=-.57 Glu209
\ " ~
~
1.95 ~
" J
Glu 114
Glul~
F i g u r e 19: B e s t p r e s e n t m o d e l for c o m p o u n d Q of m e t h a n e m o n o o x y g e n a s e .
steps, but this did not lead to a low transition state either. It was not until one of the bridging carboxylates, Glu243, was moved over to one of the irons that the same approach started to yield reasonable energies. After trying different spin states, the best state turned out to be the ferromagnetic ground state of compound P, 11A. The approach of stepwise moving along an assumed reaction path while optimizing all other degrees of freedom to find a maximum in energy, gives only an approximate transition state, which should yield a lower bound to the true barrier. An entirely proper transition state optimization where all degrees of freedom including the reaction coordinate are simultaneously optimized was not successful. The estimated (lower bound) barrier for O-O cleavage was 17 kcal/mol. A few problems with the suggested reaction pathway were noted, the most significant one being that the O-O cleavage was found to be endothermic by 9 kcal/mol. This was mainly ascribed to an inadequacy of B3LYP for this difficult reaction. Quite recently another study of different intermediates of MMO has been made by Friesner et al [77]. Their models contained an impressive number of about 100 atoms, which makes that B3LYP study the largest one made
125 so far for a system containing a transition metal. For the present discussion of the O-O bond cleavage, the most important result of that study was a new structure of compound Q~ which is slightly different from the one obtained in the previous study, which had two bridging carboxylates [75]. Instead, there is only one bridging carboxylate and one of the irons has a bound water. It is important to note that the higher stability found for this singly bridging carboxylate structure compared to the doubly bridging one is not a result of the use of a larger model. For the smaller model used previously [75] it is also more stable, by about 10 kcal/mol, but this structure was never tried in the previous study, where one goal was to find a complex with the shortest possible Fe-Fe distance. The double bridge makes the Fe-Fe distance about 0.05 /~ shorter. The structure obtained for the structure with the single carboxylate bridge for the small model is shown in Figure 19. The reason this structure is more stable than the doubly bridging one, is that a water, when it is in the second coordination sphere~ becomes extremely weakly bound to this very hydrophobic complex. Therefore, there is a large gain of energy when water becomes bound directly to the iron. A few additional comments can be made of compound Q. First~ the iron spins are antiferromagnetically coupled to a 1A state in agreement with experiment. In the earlier study, it was concluded that the antiferromagnetic coupling is one reason the Fe-Fe bond is so short. This conclusion was drawn, since the ferromagnetic 9A was found to have a much longer Fe-Fe bond. However, it has recently been found that there is also a ferromagnetic SA state which is extremely similar in structure to the antiferromagnetic 1A structure [78]. Also the electronic structure is very similar. It is instead concluded that it is the local low-spin coupling of the spins on each iron that is responsible for the short Fe-Fe distance. With the new lower energy compound Q there is good hope that the high endothermicity of the O-O bond cleavage found previously would disappear. Indeed, the reaction is now close to thermoneutral. The suggested pathway for 02 activation is shown in scheme 6. This pathway differs somewhat from the previous one [75], but again this is not due to the new structure but rather due to a more careful search of the large number of different potential surfaces with similar energies. The best pathway is in fact quite similar for the new and the old structure of compound Q. The reaction scheme can be described in the following way. As before, the O-O bond cleavage occurs on the ground state 11A potential surface of the reactant P. At the transition state, the oxygens have moved significantly closer
126
TS#
Glu .o0-
His
o. o./o I
Glu -0.. 0/2i
I
Glu
His
, ~ 1 :e~. o . ~ F , ~ ) G=u " I~Glu
I
Glu
His
o o./o
Glu
His
OH2 I I
Glu His
Glu
l .... O= I ""
I
o~.o./o t
His
Glu
I
Glu
Figure 20: Suggested reaction sequence for the O-O bond cleavage in methane monooxygenase.
to one of the irons from which there is an electron transfer. At the transition state the oxidation state of this iron is somewhere in between Fe(III) (s-4.0) and Fe(IV) (s=3.1) as shown for the optimized structure in Figure 21. This is a fully optimized structure using a B3LYP Hessian with one imaginary frequency of i764 cm -1. One of the irons is thus quite passive in the bond cleavage and remains Fe(III). The direct product of the 02 activation is an ZlA Fe2(III,IV) structure in which the oxygens share one spin. Incidentally, this low lying state also turns out to be the most likely active species in the subsequent hydroxylation step, and is also quite similar to the suggested active structure in the first B3LYP study of MMO where very simple ligand models were used [72]. Compound Q is only formed after the O-O bond cleavage is finished, after a second electron transfer occurs from the originally passive iron center. The calculated barrier for O-O bond cleavage is 19.9 kcal/mol, which is somewhat higher than the experimental estimate of around 17 kcal/mol, but there is still reasonable agreement. It should finally be noted that the O-O bond cleavage in MMO follows the same pattern as all other O-O bond cleavage and formation reactions discussed here, and this is that the reaction leads to a change of spin state. The reactant P has spin 11 and the product Q has spin
127
(~
His147
Glu243 s=.45
s-.26
Glu144 s=.20
Figure 21: Transition state structure for O-O bond cleavage in methane monooxygenase.
5 (assuming ferromagnetic coupling). In this case the reaction occurs on the reactant ground state which is an excited state of the product. The spin-transition only occurs after the O-O bond cleavage is finished. The homolytic O-O bond cleavage mechanism suggested from both the previous and the most recent studies is different from the heterolytic one suggested based on experimental studies of the pH dependence [70], see above. It is therefore of interest to investigate if the measured pH dependence could be rationalized also for a homolytic process. At present, a theoretical study of pH dependence of the present type of reaction is not possible, so more general information has to be used. For this purpose the results from the calculations of the Hessians and from dielectric cavity models are useful. The results on the temperature dependence obtainable from the Hessians turn out to be quite surprising. The entropy effect in the reaction, normally found to be quite negligible, gives in this case a large effect of 8.9 kcal/mol making the reaction endoghermic by about the same amount. The reason for this is that the complex has significantly decreased in size with an Fe-Fe bond distance change from P to Q of al-
128
most one/~. This leads to overall increases of the vibrational frequencies including also the smaller ones, important for the entropy, and leads to a smaller entropy for Q than for P. With this finding an effect of a similar size in the opposite direction becomes necessary to bring the calculated results for the reaction energy in agreement with experiments. A calculation of the dielectric effects using the SCI-PCM method [20], gives no hint of such an effect. However, a decrease in size of the complex should not only lead to a lower entropy, it should also make the complex easier to fit into the enzyme and therefore cause less strain in the system. This effect should favor the smaller compound Q compared to P. Precisely how large this cavitation effect is, is difficult to obtain and requires an entire molecular modeling study. Instead, the DPCM method [19] can be used to get an estimate of the cavitation effect. Using this method the cavitation energy effect is estimated to be 5.0 kcal/mol if the solvent used is water, and 4.5 kcal/mol if the solvent is diethylether which has somewhat weaker hydrogen bonds. For an enzyme the effect is expected to be larger than this since the rigidity of the backbone is also involved. This means that the cavitation energy could well cancel the entropy effect. Since the cavity effect will depend strongly on the strength of the hydrogen bonding network, it is also likely to depend strongly on pH. For lower pH, the additional protons are expected to spread their positive charges to make the hydrogen bonding stronger. Therefore, the best suggestion at present, to make the present results consistent with the measured pH dependence is that the increased rate at lower pH is due to a larger cavitation effect. In order to investigate if additional protons have more direct effects on the complex, protons were added at different places and the energy difference between P and Q was recomputed. However, no such effects leading to a relative stabilization of the product were found. As a final comment it is also quite difficult to understand how a heterolytic splitting for MMO should occur in practice since this would require that one of the bis-p-oxo bridges should not come from 02. A mechanism, where a p-oxo bridge is always present in the complex is not possible either since it is not present in the X-ray structures of either the reduced or the oxidized forms of MMO.
4.4. 0 - 0
b o n d a c t i v a t i o n in m a n g a n e s e c a t a l a s e
Catalases are metaUoenzymes that protect the cell from oxidative damage by excess hydrogen peroxide produced during 02 metabolism. Hydro-
129
gen peroxide is destroyed forming oxygen and water in a disproportionation reaction: 2H202 ~
2H20 + 02 + AE
(8)
This reaction is quite exothermic by 52 kcal/mol [15], which means that one or several steps in the catalytic cycle of the enzyme may have a large driving force. There are two major classes of catalases. The most abundant of these has an Fe protoporphyrin IX cofactor with a proximal tyrosinate ligand trans to the position where the substrate binds. In the first step of the substrate reactions for this class, the resting ferric state cleaves the O-O bond in hydrogen peroxide. The mechanism is likely to be quite similar to the one discussed above for heme peroxidases. In the second class of catalases there is instead an active dimanganese complex. Crystal structures have been determined for Thermus thermophilus [79] and Lactobacillus plantarum [80]. These structures show that both enzymes have a bridged binuclear manganese cluster, see Figure 22. During turnover the binuclear cluster cycles between two different sets of oxidation states, the reduced form, Mn2(II,II), and the oxidized form, Mn2(III,III), which are both in principle indefinitely stable.
Giu ~Glu ~,!~m~'~-'~Mn(~ ) H20
.Ho.-
His
I
Glu Figure 22- The oxidized bridged binuclear manganese cluster of Mn-catalase. Several mechanisms for the O-O bond breaking in manganese catalase have been suggested. In one of these mechanisms, internal protein residues play a key role as proton acceptors and donors [81]. In the first step, the substrate binds directly to a terminal site on Mn(II) by displacement of a water ligand. The internal proton residue is then imagined to function as proton acceptor from hydrogen peroxide. The cleavage of the O-O bond is suggested to occur after a critical step where the terminally lighted substrate swings in and forms a / ~ - ~2-peroxide. At this stage the O-O bond
130
is broken in a process where the external residue back-delivers the proton to form product water. In another proposed mechanism [82], the lability of the bridging group of the reduced complex is suggested to allow substitutional insertion of peroxide in this position. Isomerization of peroxide to the gem-protonation isomer, is then suggested to sufficiently polarize the O-O bond for heter01ytic bond cleavage. A mechanism in which interconversion of Mn2(II,II)(#-OH)2 and Mn2(III,III)(p-O)2 core structures forms the basis of the catalytic cycle, has also been suggested [83]. Step I
H202
Glu HI2~1/OH2~. / G l u ""~Mn(l~) " ~ l ~ n ~
Glu
H20 I
H202~ (~Glu
~M~)
.is-- i \o/i---.,s
. ~
I
n
I
Glu
Glu
Step 2
Step3
. ~ ....... ~-.~ Gtu
Hi
I
~.o.::
O~
n"nd~Giu
0
His
H~.
.OH
~M~ITI) ~c"
I Glu
Mn[~
I
Glu
Figure 23: Suggested reaction sequence for the O-O bond cleavage step in manganese catalase.
The full catalytic cycle for manganese catalase has recently been studied using B3LYP [84]. This cycle includes both the O-O bond cleavage of the first H202 and the formation of 02 from the second H202, but only the bond cleavage will be discussed here. After an investigation of several possibilities, the reaction sequence given in Figure 23 was found to be the best one. In the first step of this scheme, H202 replaces a bridging water of the reduced Mn2(II,II) dimer, which is a nearly thermoneutral process as expected. At this point, the O-O bond should be cleaved. As discussed several times in this review, this leads to a spin problem for the present type of complexes with a weak ligand field. Assuming ferromagnetic coupling, which is an excellent approximation for these systems, the neutral
131
Mn2(II,II) reactant will have spin 11, while the Mn2(III,III) will have spin 9. A spin-crossing is therefore needed at some point along the reaction path~ irrespectively of whether the O-O bond cleavage is homolytic or heterolytic. This situation was discussed above for O-O bond formation in PSII and the present situation is just the reverse one~ so the same arguments apply and the conclusion is repeated. For a low barrier reaction,
either the excited state of the reactant corresponding to the product ground state (the low-spin state), or the excited state of the product corresponding to the reactant ground state (the high-spin state), has to be low lying. In the present case the choice is very simple since the reaction is found to be strongly exothermic by more than 40 kcal/mol. Essentially the only possibility is that the reaction occurs on the reactant ground state surface, which corresponds to an excited state of the product. With the high exothermicity, there is good hope that the excited state of the product will be quite low-lying compared to the reactant ground state, and this is also found to be the case. The fully optimized transition state structure for O-O bond cleavage on this potential surface is shown in Figure 24. This structure, which has an imaginary frequency of i544 c m -1, w a s very difficult to locate and required several Hessian determinations. A critical point is the close contact between one of the oxygens in H202 with the active manganese, which is necessary for an electron transfer. As seen on the spin of this manganese given in the figure, it is somewhere in between Mn(II) (s-4.S) and Mn(III) (s-3.9). The rest of the spin is on the inactive manganese, which remains at s-4.8 and on the outermost oxygen of H202 which has a spin of 0.40. The calculated barrier is 15.4 kcal/mol, which appears somewhat high for this reaction since manganese catalases dismute H202 at exceedingly high rates approaching that of a diffusion limited reaction [81]. For this reason the effect of an additional proton from an external proton donor was also investigated. It turns out that the added proton moves over from H202 to Glu36, followed by O-O bond breaking with a mechanism and barrier height which are essentially identical to those for the neutral system. It is therefore concluded that the mechanism is either slightly different, with H202 initially binding to the empty site for example, or it is simply the B3LYP method which is responsible for the too high barrier. With an average error of 3.1 kcal/mol, see section 2, there is always the possibility that a barrier could be overestimated by 3-5 kcal/mol. The product of the neutral reaction is a Mn2(III,II) complex with an OH-radical. This part of the reaction is weakly exothermic by 4 kcal/mol. In the next, third, step there is a spin-crossing which is exothermic by about 40 kcal/mol and
132
Glu155 1.15,~~~
s=4.48(~ u 9
His73
##" ~ 2.23', . ~ .~2"02 2 . 3 ~ ~ 2 . 2 1
9
Glu70
.
~C)
Figure 24: Transition state structure for O-O bond cleavage in manganese catalase.
leads to the oxidized Mn2(III,III) dimer. The type of chemistry occurring in these steps is termed Fenton chemistry~ and is actually what the manganese catalase is there to prevent. In order to prevent that the OH-radical produced in the first step moves away, the radical is immediately caught by the empty site of the second manganese, where it is destroyed by the spin-crossing. The energetics of the O-O bond cleavage in manganese catalase is given schematically in Figure 25. The results for two models are given, for imidazole and ammonia modeling the histidines. As seen in the figure these results are very similar, which means that ammonia ligands are very good and useful models for the electronic structure effects from histidine ligands. However, it should be pointed out that since ammonia has more flexible hydrogen bonding capabilities than imidazole, a requirement for using ammonia as a model is that no artificial hydrogen bonds are introduced. It is therefore important to stress that in the present manganese catalase models, neither the ammonias nor the imidazoles form any hydrogen bonds at any stage.
133
Step 2 16.4
20.0 -] 10.0 0.0 o
Step 1 +1.2 -3.9
-0.3
-4.5
E t~ o
-10.0
v
o
.,....,
-20.0
Step 3
or (D o
>
-30.0
(D
rr
-40.0
-43.2 -43.6
-50.0 Reaction coordinate
-60.0
Figure 25: Energy diagram for the suggested mechanism for O-0 bond cleavage in manganese catalase. The results using the imidazole model are marked with the thick line and those for the ammonia model with the thin line.
4.5. O-O b o n d activation in isopenicillin N s y n t h a s e
IsopeniciUin N synthase (IPNS) is a mononuclear non-heme iron enzyme that plays an important role for biosynthesis of antibiotics [85]. Using one 02 molecule, the enzyme catalyzes the bicyclic ring closure of the substrate 5-(g-a-aminoadipoyl)-g-cysteinyl-D-valine (ACV) to form two water molecules and isopenicillin N (IPN), a precursor of the antibiotics peni-
134
cillins and cephalasporins [85, 86, 87]. Recently, the X-ray crystal structure was obtained for IPNS of Aspergillus nidulans, complexed with manganese instead of iron. The crystal structure shows that the metal is octahedrally coordinated by two histidines~ one aspartate, one glutamine and two water molecules [SS]. On the basis of substrate analogues and isotopic labeling experiments, a catalytic mechanism for IPNS was first proposed by Baldwin et al [89, 90]. Additional experimental data, giving a more detailed picture of the enzyme mechanism have been gathered in two recent reviews [66, 91]. In the step following substrate binding in this mechanism, the Cys-a-C-H hydrogen of ACV migrates to the dioxygen, giving a peroxide and formation of the fourmembered fl-lactam ring. Protonation of the peroxide then leads to the loss of a second water molecule and formation of an oxo-ferryl intermediate. The mechanism of isopeniciUin n synthase has recently been studied theoreticaUy using the B3LYP method [92]. The catalytic cycle is suggested to occur in 8 steps, where the closure of the four-membered fl-lactam ring precedes the closure of the five-membered thiazolidine ring. Two of the reaction steps were found to have significant barriers of similar size, indicating that these steps could be rate-determining. One of these steps is the fl-lactam ring closure, which is the only one discussed here since it involves the 02 activation. The suggested reaction sequence for this part is shown in Figure 26. The reaction scheme in Figure 26 starts where the substrate has formed a sulfur bond to iron in oxidation state Fe(II), to form structure 1. At this point an oxygen molecule replaces a water ligand to form complex 2. This is followed by a hydrogen atom transfer over transition structure 3 leading to the peroxide structure 4. The next step is the O-O bond cleavage reaction leading to structure 6, which should not be rate limiting and therefore have a barrier smaller than 15-17 kcal/mol [89]. As a first attempt, an O-O bond cleavage was tried where a proton was moved from the Valnitrogen to the peroxy group, with a simultaneous closure of the ring and formation of water, as shown in 4-6. In this reaction, the iron oxidation state changes two steps from Fe(II) to Fe(IV). Since both these oxidation states with low ligand fields have quintet ground states, this reaction could in principle occur on a single potential surface all the way from reactants to products. This would make it unique among the O-O bond cleavage and formation reactions studied here. The optimized transition state structure
135
H2 His .~Asp-30 / C ~ S...F.~.~aHis O2 R1HC ~ kC)H 1 H20 2 ~ O ' ~ N~. H _ } " ..-" "H3 HR2~H 3
~-
-30
H2 His ~,,Asp-30 ~, ,,~/C'~s---~F~His nln.~ ~\
H20
1
H Hisk ,~Asp
/ C ~ S . ~ ' . ~ His
RIHC u "''~ \
0-0 OH2
O"~N . = ~ . , R 3 I:1 R2 ~R3
1
2
~ J
H+
/
'~ S
His
.~"" "'.. I ..,,,,Asp-3 +1 ..~" =. R,HC./ o~'l~e~Hi s "~H20 O~.N-~x_ H OH2
HO_OA\OH2 R2 4
4'
HR1
H His ~Asp --[0 R1HC~C~s...F~'~His
iH
J
n"o-o OH::, 3,TS
-3o H Hisk ~Asp R1H~
I
O"\ N ~ ~ , R 3 H I~2 ~R3
o~c~
1
N..-CH
His
-1+1
I ..,,Asp R2'"R3~ 'S-.. Oe,:~e~His H OH2
A\
.)%H
O L N..H.O'"OOH2 H3n3
5',TS
5,TS
HR1
o~c
__~HR'
-lo
~.~CH H~s N S ~ J^...,,,A,p R2"tR3~3RR % 0 Ip''~'~His H OH2
~ H+
-q+]
O-X~ \.CH His /HN ~s~J R2.,s ~.he'''tiiAsp ~_ 0 I "~His ~=._~H OH2 RaR3
Figure 26: Suggested reaction sequence for the first step in isopeniciUin N synthase.
5 is shown in Figure 27. Even though it is spin-allowed, this straightforward pathway has a very high barrier of 34 kcal/mol, which is thus at least 15 kcal/mol higher that what should be expected for this step. Since this discrepancy between experimental and calculated barriers is much larger than the normal errors from using the present methods, it is concluded that a different mechanism must be applicable for this step. The reason the barrier is so high is that a large electronic structure change is needed for iron to change its oxidation state by two units in one concerted step. The first alternative tried to the reaction on the reactant ground state surface was a reaction on an excited surface. In this case the excited state of the Fe(II)-peroxide reactant is an Fe(III)-complex with a substrate radical. At first sight this may seem like a reasonable alternative where the
136
1.10 i it
9
2.68
I I
1.51 II I
1.73
2.19 2.12 !.21
Figure 27: T h e optimized transition state s t r u c t u r e 5 for O-O b o n d cleavage in isopeniciUin N synthase.
product would be an initial Fe(III)-oxyl radical state with a closed ring. However, the computed excitation energy to reach this state for the reactant is exceedingly high, so also this mechanism is ruled out. Instead an outside proton donor is invoked. If a proton is transferred to peroxide from outside and the ring closure occurs after the water is formed, the barrier is significantly lowered. As shown from studies on other HOO-Fecontaining systems [55, 64], see above for cytochrome oxidase and heme peroxidases, protonation of the peroxide group can lead directly to dissociation of the water molecule and formation of an iron-oxo intermediate with only a small barrier or even without a barrier. Even though protonation of the hydroperoxy group followed by dissociation of water is probably a fast reaction step if there is a cheap proton available that can assist the
137
reaction, the activation energy obtained for the next step including the ring closure is still rather high. The energy obtained for the transition state (5') is +20.9 kcal/mol relative to the energy obtained for 4 ~. When the ring closure is completed the proton is proposed to leave the active site (6 ~ -+6). Throughout this reaction step 4 ~ --+6, iron has the oxidation state Fe(IV). The exothermicity for the fl-lactam ring formation step (4--+6) was found to be 3.3 kcal/mol.
5. C O N C L U S I O N S In the present review, a few theoretical studies of biological reactions involving formation and cleavage of O-O bonds have been described. These reactions have some general features in common. The most important one is that whenever an O-0 bond is formed or broken, there will be a change of potential surface usually involving a spin-state change. This change occurs because at least two electrons are needed to break the O-O bond and these are taken from the transition metals in the complex, leading to changes in the oxidation states and therefore also in general to spin-state changes for the metal. An important experience from these studies is that the spin-state change occurs either before the reaction has started or after the reaction is completed (with the possible exception of heine peroxidases). This means that the reactions occur either on the excited state potential surface of the reactants or of the products, whichever has the lowest energy. A consequence of this finding is that in most cases a ligand radical will appear either directly before or directly after the reaction. For the O-O bond formation in PSII, an oxyl radical is suggested to be formed in a position bridging two manganese centers in the water-oxidizing cluster in the critical S3-state after which the O-O bond is formed. The structure of the manganese cluster and the mechanisms of the earlier S-state transitions are designed to make this formidable chemical task possible. In manganese catalase, where an O-O bond is broken, a hydroxyl radical is formed as the direct product in a Fenton type reaction. This is a much simpler chemical task than the one in PSII, and is made possible simply by selecting a complex which gives a very exothermic reaction for the O-O bond cleavage, which is not an unusual situation for a reaction between a transition metal complex and H202. The dissociation of the O-O bond involving heme-complexes are slightly different from the ones in PSII and manganese catalase, since the hemes
138
have a stronger ligand field. Nevertheless, also in these cases will the O-O bond cleavage reaction occur on an excited state of either the reactants or the products and also lead to the appearance of radicals. In cytochrome oxidase, there will be a tyrosyl radical at the end of the reaction, in cytochrome c peroxidase there will be a tryptophan radical, and in P-450 there will be a heme radical. In these cases the reaction occurs on the excited state of the product since it has the lowest lying excited state. Another feature of general interest is that in most of the present reactions, protonations of the oxygens play a key role in the O-O bond breaking process. In cytochrome oxidase an additional proton at the active site is suggested to be necessary for bond-breaking, while in heme peroxidases the distal histidine is essential for either protonating or deprotonating the oxygens, or both. In manganese catalase, a carboxylate deprotonates one of the oxygens of H202 simultaneously as the O-O bond is cleaved, and in isopenicillin N synthase, an outside proton donor is suggested to play an important role. Finally, the O-O bond formation in PSII occurs in concert with a deprotonation of the external water involved. In all studies described here the B3LYP method has been used, which is not a coincidence since this method is by far the one most frequently used for high accuracy studies on biological systems. Most models discussed have contained in the range 30-50 atoms, but studies have actually been performed with up to 100 atoms in the model. With the rapid development of computer hardware and software, this limit is likely to be extended even further the coming years.
References [1] D. Voet, J.G. Voet, Biochemistry, (J. Wiley and Sons, Inc., New York, 1995). [2] P.E.M. Siegbahn, Adv. Chem. Phys., Vol. XCIII (1996), p.333 (edited by I. Prigogine and S.A. Rice, J. Wiley). [3] A.D. Becke, Phys. Rev., A38 (1988) 3098. [4] A.D. Becke, J. Chem. Phys., 96 (1992) 2155-2160. [5] A.D. Becke, J. Chem. Phys., 98 (1993) 5648-5652. [6] P.E.M. Siegbahn, M.R.A. Blomberg, Ann. Rev. Phys. Chem., 50 (1999) 221-249. [7] P.E.M. Siegbahn, M.R.A. Blomberg, Chem. Rev., in press.
139
[8] P.E.M. Siegbahn, R.H. Crabtree, in Metal-Oxo and Metal-Peroxo Species in Catalytic Oxidations, edited by B. Meunier, Springer, Heidelberg, in press. [9] P.J. Stevens, F.J. Devlin, C.F. Chablowski, M.J. Frisch, J. Phys Chem., 98 (1994) 11623. [10] C. Lee, W. Yang, R.G. Parr, Phys. Rev., B37 (1988) 785.
[11] S.H. Vosko, L. Wilk, M. Nusair, Can. J. Phys., 58 (1980) 1200. [12] a. J.P. Perdew, Y. Wang, Phys. Rev. B, 45 (1992) 13244, b. J.P. Perdew, in Electronic Structure of Solids, eds, P. Ziesche, H. Eischrig, Akademie Verlag, Berlin (1991), c. J.P. Perdew, J.A. Chevary, S.H. Vosko, K.A. Jackson, M.R. Pederson, D.J. Singh, C. Fiolhais, Phys. Rev. B, 46 (1992) 6671. [13] M.J. Frisch, G.W. Trucks, H.B. Schlegel, P.M.W. Gill, B.G. Johnson, M.A. Robb, J.R. Cheeseman, T. Keith, G.A. Petersson, J.A. Montgomery, K. Raghavachari, M.A. A1-Laham, V.G. Zakrzewski, J.V. Ortiz, J.B. Foresman, J. Cioslowski, B.B. Stefanov, A. Nanayakkara, M. Challacombe, C.Y. Peng, P.Y. Ayala, W. Chen, M.W. Wong, J.L. Andres, B.S. Replogle, R. Gomperts, R.L. Martin, D.J. Fox, J.S. Binkley, D.J. Defrees, J. Baker, J.P. Stewart, M. Head-Gordon, C. Gonzalez, J.A. Pople, Gaussian 94 Revision B.2, 1995, Gaussian Inc, Pittsburgh, PA. [14] M.J. Frisch, G.W. Trucks, H.B. Schlegel, G.E. Scuseria, M.A. Robb, J.R. Cheeseman, V.G. Zakrzewski, J.A. Montgomery, Jr., R.E. Stratmann, J.C. Burant, S. Dapprich, J.M. Millan, A.D. Daniels, K.N. Kudin, M.C. Strain, O. Farkas, J. Tomasi, V. Barone, M. Cossi, R. Cammi, B. Mennucci, C. Pomelli, C. Adamo, S. Clifford, J. Ochterski, G.A. Petersson, P.Y. Ayala, Q. Cui, K. Morokuma, D.K. Malick, A.D. Rabuck, K. Raghavachari, J.B. Foresman, J. Cioslowsld, J.V. Ortiz, B.B. Stefanov, G. Liu, A. Liashenko, P. Piskorz, I. Komaromi, R. Gomperts, R.L. Martin, D.J. Fox, T. Keith, M.A. A1-Laham, C.Y. Peng, A. Nanayakkara, C. Gonzalez, M. Challacombe, P.M.W. Gill, B. Johnson, W. Chen, M.W. Wong, J.L. Andres, M. Head-Gordon, E.S. Replogle, J.A. Pople, Gaussian 98, 1998, Gaussian Inc, Pittsburgh, PA. [15] L.A. Curtiss, K. Raghavachari, R.C. Recffern, J.A. Pople, J. Chem. Phys. 97 (1997) 1063-79. [16] L.A. Curtiss, K. Raghavachari, R.C. Redfern, V. Rassolov, J.A. Pople, J. Chem. Phys. 109 (1998) 7764-76. [17] C.W. Bauschlicher,Jr, A. Ricca, H. Partridge, S.R. Langhoff, in Recent Advances in Density Functional Methods, Part II, Ed. D.P. Chong, p.165 (World Scientific Publishing Company, Singapore, 1997). [18] K.A. Gardner, J.M. Mayer, Science, 269 (1995) 1849, and private communication. [19] a. S. Miertus, B. Scrocco, J. Tomasi, Chem. Phys., 114 (1981) 117, b. S. Miertus, J. Tomasi, Chem. Phys., 65 (1982) 239, c. M. Cossi, V. Barone, R. Cammi, J. Tomasi, Chem. Phys. Left., 255 (1996) 327.
140
[20] a. K.B. Wiberg, P.R. Rablen, D.J. Rush, T.A. Keith, J. Am. Chem. Soc., 117 (1995) 4261, b. K.B. Wiberg, T.A. Keith, M.J. Frisch, M. Murcko, J. Phys. Chem., 99 (1995) 9072. [21] S.J. Lippard, J. Berg, Principles of Bioinorganic chemistry (1994), Univ. Science books, Mill Valley, CA. p.218 [22] P. Nordlund, B.-M. SjSberg, H. Eklund, Nature, 345 (1990) 593-598. [23] X.-D. Su, B.-O. Persson, B.-M. SjSberg, P. Nordlund, in manuscript. [24] D.F. Ghanotakis, G.T. Babcock, C.F. Yocum, FEBS Left., 167 (1984) 127-130. [25] K. Lindberg, L. Andreasson, Biochemistry, 35 (1996) 14259. [26] V.K. Yachandra, K. Sauer, M.P. Klein, Chem. Rev., 96 (1996) 2927-2950. [27] G.C. Dismukes, Y. Siderer, Proc. Natl. Acad. Sci., 78 (1981) 274-278. [28] C.E. Dube', R. Sessoli, M.P. Hendrich, D. Gatteschi, W.H. Armstrong, J. Am. Chem. Soc., 121 (1999) 3537-3538. [29] K.A./~hrling, P.J. Smith, R.J. Pace, J. Am. Chem. Soc.,120 (1998) 13202-13214. [30] R.M. Cinco, J.H. Robblee, A. Rompel, C. Fernandez, V.K. Yachandra, K. Sauer, M.P. Klein, J. Phys. Chem., B102 (1998) 8248-8256. [31] B. Kok, B. Forbush, M. McGloin, Photochem. Photobiol., 11 (1970) 457. [32] a. C.W. Hoganson, N. Lydakis-Simantiris, X.-S. Tang, C. Tommos, K. Warncke, G.T. Babcock, B.A. Diner, J. McCracken, S. Styring, Photosynth. Res., 46 (1995) 177, b. G.T. Babcock, in Photosynthesis from Light to Biosphere, ed. P. Mathis, Kluwer, Dordrecht, (1995) Vol 2, pp. 209, c. C. Tommos, X.-S Tang, K. Warncke, C.W. Hoganson, S. Styring, J. McCracken, B.A. Diner, G.T. Babcock, J. Am. Chem. Soc., 117 (1995) 10325. [33] a. M. Haumann, O. BSgershausen, D. Cherepanov, R. Ahlbrink, W. Junge, Photosynth. Res.,51 (1997) 193-208, b. It. Ahlbrink, M. Haumann, D. Cherepanov, O. BSgershausen, A. Muilddjanian, W. Junge, Biochemistry, 37 (1998) 1131-1142, c. M. Haumann, W. Junge, Biochim. Biophys. Acta, 1411 (1999) 86-91. [34] M.R.A. Blomberg, P.E.M. Siegbahn, S. Styring, G.T. Babcock, B. ~kermark, P. Korall, J. Am. Chem. Soc., 119 (1997) 8285-8292. [35] M.R.A. Blomberg, P.E.M. Siegbahn, Theor. Chem. Acc., 97 (1997) 72-80. [36] M.R.A. Slomberg, P.E.M. Siegbahn, Mol. Phys., 96 (1999) 571-581. [37] P.E.M. Siegbahn, Inorg. Chem., in press. [38] P.E.M. Siegbahn, In Molecular Modeling and Dynamics of Bioinorganic Systems, P. Comba, L. Banci, Ed., Kluwer Academic Publishers, (1997) 233-253.
141
[39] P.E.M. Siegbahn, R.H. Crabtree, J. Am. Chem. Soc., 121 (1999) 117-127. [40] a. S.A. Styring, A.W. Rutherford, Biochemistry, 27 (1988) 4915-4923, b. It.G. Evelo, S.A. Styring, A.W. Rutherford, A.J. Hoff, Biochim. Biophys. Acta, 973 (1989) 428442. [41] It.It. Sharp, In Manganese Itedox Enzymes, V.L. Pecoraro, Ed., VCH: New York, (1992) 177-196. [42] L. Iuzzolini, J. Dittmer, W. Dbrner, W. Meyer-Klaucke, H. Dau, Biochemistry, 37 (1998) 17112-17119. [43] T. Ono, T. Noguchi, Y. Inoue, M. Kosunoki, T. Matsushita, H. Oyanagi, Science, 258 (1992) 1335-1337, [44] J. Messinger, M. Badger, T. Wydrzinski, Proc. Natl. Acad. Sci. USA, 92 (1995) 3209-3213. [45] S. Yoshikawa, K. Shinzawa-Itoh, It. Nalmshima, It. Yaono, E. Yamashita, N. Inoue, M. Yao, M.J. Fei, C.P. Libeu, T. Mizushima, H. Yamaguchi, T. Tomizaki, T. Tsukihara, Science, 280 (1998) 1723-1729. [46] C. Ostermeier, A. Harrenga, U. Ermler, H.Michel, Proc. Nat'l. Acad. Sci., USA 94
(1997) 10547-10553. [47] G.T. Babcock, M. Wikstrbm, Nature, 356 (1992) 301-309. [48] T. Ogura, S. Takahashi, K. Shinzawa-Itoh, S. Yoshikawa, T. Kitagawa, Bull. Chem. Soc. Jpn., 64 (1991) 2901-2907. [49] C. Varotsis, Y. Zhang, E.H. Appelman, G.T. Babcock, Proc. Nat'l. Acad. Sci., USA, 90 (1993) 237-241. [50] L.C. Weng, G.M. Baker, Biochemistry, 30 (1991) 5727-5733. [51] N.J. Watmough, M.It. Cheesman, C. Greenwood, A.J. Thomson, Biochem. J., 300
(1994) 469-475. [52] M. Fabian, G. Palmer, Biochem., 34 (1995) 13802-13810. [53] J. Wang, J. Rumbley, Y.C. Ching, S. Takahashi, It.B. Gennis, D.L. Rousseau, Biochem., 34 (1995) 9819-9825. [54] D.A. Proshlyakov, M.A. Pressler, G.T. Babcock, Proc. Natl. Acad. Sci. USA, 95 (1998) 8020-25. [55] M.It.A. Blomberg, P.E.M. Siegbahn, G.T. Babcock, M. Wikstrbm, J. Inorg. Biochem. in press. [56] M.R.A. Blomberg, P.E.M. Siegbahn, G.T. Babcock, M. Wikstrbm, to be published. [57] M. Karpefors, P..~delroth, P. Brzezinski, private communication.
142
[58] M. Ralle, M.L. Verkhovskaya, J.E. Morgan, M.I. Verkhovsky, M. Wikstrbm, N.J. Blackburn, Biochemistry, 38 (1999) 7185-7194. [59] R. Mitchell, P.R. Rich, Biochim. Biophys. Acta, 1186 (1994) 19-26. [60] M.R.A. Blomberg, P.E.M. Siegbahn, G.T. Babcock, J. Am. Chem. Soc., 120 (1998) 8812-8824. [61] F. MacMillan, A. Kannt, J. Sehr, W. Prisner, H. Michel, Biochemistry, 38 (1999) 9179-9184. [62] G.T. Babcock, private communication. [63] T.L. Poulos, J. Kraut, J. Biol. Chem., 255 (1980) 8199-8205. [64] M. Wirstam, M.R.A. Blomberg, P.E.M. Siegbahn, J. Am. Chem. Soc., 121 (1999) 10178-10185. [65] M. Wirstam, dissertation. Stockholm university, 2000. [66] A.L. Feig, S.J. Lippard, Chem. Rev., 94 (1994) 759-805. [67] B.J. Wallar, J.D. Lipscomb, Chem. Rev., 96 (1996) 2625-2657. [68] A. M. Valentine, S.J. Lippard, J. Chem. Soc. Dalton Trans. (1997) 3925. [69] A.C. Rosenzweig, P. Nordlund, P.M. Takahara, C.A. Frederick, S.J. Lippard, Chem. Biol., 2 (1995) 409-418. [70] S.-K. Lee, J.D. Lipscomb, Biochemistry, 38 (1999) 4423-4432. [71] J.C. Nesheim, J.D. Lipscomb, Biochemistry, 35 (1996) 10240-10247. [72] P.E.M. Siegbahn, R.H. Crabtree, J. Am. Chem. Soc., 119 (1997) 3103. [73] H. Basch, K. Mogi, D.G. Musaev, K. Morokuma, J. Am. Chem. Soc., 121 (1999) 7249-7256. [74] a. K. Yoshizawa, Y. Shiota, T. Yamabe, OrganometaUics, 17 (1998) 2825-2831, b. K. Yoshizawa, T. Ohta, Y. Shiota, T. Yamabe, Chem. Lett. (1997) 1213-1214, c. ibid. Bull. Chem. Soc. Jpn., 71 (1998) 1899. [75] P.E.M. Siegbahn, J. Inorg. Chem., 38 (1999) 2880-2889. [76] L. Shu, J.C. Nesheim, K. Kauffmann, E. Munck, J.D. Lipscomb, L. Que, Jr., Science
275 (1997)515. [77] B.D. Dunietz, M.D. Beachy, Y. Cao, D.A. Whittington, S.J. Lippard, R.A. Friesner, in press. [78] P.E.M. Siegbahn, to be published.
143
[79] V.V. Barynin, P.D. Hempstead, A.A. Vagin, S.V. Antonyuk, W.R. Melik-Adamyan, V.S. Lamzin, P.M. Harrison, P.J. Artymiuk, J. Inorg. Biochem., 67 (1997) 196. [80] V.V. Barynin, P.M. Harrison, P.J. Artymiuk, S.V. Antonyuk, V.S. Lamzin, M.M. Whittaker, J.W. Whittaker, to be published. [Sl] G.C. Dismukes, Chem. Rev., 96 (1996) 2909-2926. [82] M.M. Whittaker, V.V. Barynin, S.V. Antonyuk, J.W. Whittaker, Biochemistry, 38 (1999) 9126-9136. [83] A.E. Meier, M.M. Whittaker, J.W. Whittaker, Biochemistry, 35 (1996) 348-360. [84] P.E.M. Siegbahn, to be published. [85] J.E. Baldwin, E. Abraham, Nat. Prod. Rep., 5 (1988) 129-145. [86] R.L. White, E.M. John, J.E. Baldwin, E.P. Abraham, Biochem. J., 203 (1982) 791793. [87] Z.A. Bainbridge, R.I. Scott, D. Perry, J. Chem. Tech. Biotechnol., 55 (1992) 233-238. [88] P.L. Roach, I.J. Clifton, V. FiilSp, K. Harlos, G.J. Barton, J. Hajdu, I. Andersson, C.J. Schofield, J.E. Baldwin, Nature, 375 (1995) 700-704. [89] J.E. Baldwin, M. Bradley, Chem. Rev., 90 (1990) 1079-1088. [90] J.E. Baldwin, G.P. Lynch, C.J. Schofield, Tetrahedron, 48 (1992) 9085-9100. [91] L. que,Jr., R.Y.N. Ho, Chem. Rev., 96 (1996) 2607-2624. [92] M. Wirstam, P.E.M. Siegbahn, submitted.
This Page Intentionally Left Blank
L.A. Eriksson (Editor) Theoretical Biochemistry - Processes and Properties of Biological Systems Theoretical and Computational Chemistry, Vol. 9 9 2001 Elsevier Science B.V. All rights reserved
145
Chapter 4
Catalytic Reactions of Radical Enzymes Fahmi
Himo a
and Leif A. Eriksson b
aDepartment of Physics, Stockholm University, Box 6730, 113 85 Stockholm, Sweden.
[email protected]. bDepartment of Quantum Chemistry, Box 518, Uppsala University, 751 20 Uppsala, Sweden.
[email protected]
1. Introduction
The number of enzymes discovered to harbour and employ a metastable radical site for the catalytic activity has steadily increased over the past decades [1]. Besides 'pure' radical enzymes, i.e., systems that use a stable radical for the catalytic action at the active site, theoretical studies indicate that radical intermediates are also employed in several other systems - see e.g. the chapter by Siegbahn and Blomberg. In addition, many key reactions in biology make use of radical forms of cofactors such as quinones in photosynthesis (see chapter by Wheeler), the vitamin E controlled quenching of lipid peroxidation or the various catalytic mechanisms involving radical forms of coenzyme B12 (see chapter by Radom et al). The form in which the radical nature is stored and employed hence differs significantly from system to system, and the aim of the present chapter is to give a flavour of some of these aspects with key focus on radical enzymes. Radical enzymes are, like radical reactions in general, usually characterised by high turnover for the catalytic processes. It is hence rather difficult to study these reactions experimentally in order to gain direct insight into the mechanisms. In addition, several of the enzymes are membrane bound or anaerobic, why the determination of the crystal structures of many of these has been a formidable task. Most of the mechanistic proposals have thus been based on mutagenesis experiments (site specific exchange of an amino acid), kinetic measurements and isotope effects, and studies of inhibitor 'by-products'. Despite their high reactivity, radical systems do, however, have one advantage over non-radical systems - the existence of unpaired spin. The unpaired spin leads to interaction between the magnetic moments of the
146 unpaired electron and the magnetic nuclei in the sample; interactions that are observable via electron spin/paramagnetic resonance techniques (ESR, EPR; as well as more advanced related methods such as ENDOR and TRIPLE spectroscopies). These so-called hyperfme parameters can be used to identify the radical center as well as to single out protonation states and geometric conformers of the different radicals observed, thus providing further insight into the mechanis ms. The radical hyperfine coupling constants (HFCCs) can also be computed with high accuracy. Most commonly, we report isotropic HFCCs (Ai~o, the traces of the diagonatized 3x3 hyperfme interaction tensor), also known as Fermi contact terms, that essentially are overlap integrals times unpaired spin population at r=0 (i.e. in each magnetic nucleus). You will however also see anisotropic data r e p o r t e d - these are the remainder of the diagonalized HFCC tensors once the trace has been removed, usually denoted Txx, Tyy and Tzz (if the full diagonal terms are reported, i.e., Ai~o + Txx etc, these are labelled Axx etc). The anisotropic couplings describe the non-spherical/spatial distribution of the unpaired spin surrounding each nucleus, and are computed using dipole-dipole integrals. Experimentally, also the so-called g-factors are reported. These determine at what magnetic field or frequency we might find the center of the spectrum, and also provide important information about the nature of the radical in question. Computationally, however, g-factors are much more difficult to compute than hyperfine couplings - to date only a few reports are available on biologically relevant systems- and we will throughout assume the free electron value for g, 2.0023, in our reported data. One of the advantages with theory is that we are able to test a large set of mechanistic proposals, in order to determine which that from an energetic point of view are the most likely ones. The mechanisms discussed herein are thus mainly those representing the energetically most favourable paths for model systems in vacuum (or in a dielectric cavity). Theoretical data should hence not be regarded as 'the truth' but rather as one more piece of information towards a complete understanding of how the various enzymes function, to complement the various experimental studies. We should also emphasize that for each of the systems discussed herein, several additional mechanisms/paths have been explored but discarded based on unreasonable energetics, spin distributions that do not comply with experimental findings, and similar. Using computed HFCCs, geometric structures, and energy surfaces we can hence construct a most probable reaction path for the catalytic mechanisms, and we will herein report on detailed theoretical studies of model systems to Galactose oxidase (GO) [2], Pyruvate formate-lyase (PFL) [3,4] and Ribonucleotide reductase (RNR) [5,6]. To finally verify the suggested mechanisms, the full enzymes (or as much as possible thereof) need to be taken into account, as well as temperature, dynamics, solvents, etc. We are presently witnessing the arrival of such approaches, combining different methodologies
147 into e.g. the QM/MM methods, or DFT-MD. We also refer the interested reader to some of the many excellent reviews available, where the experimental fmdings on the above systems are summarized [1,7,8].
2. Methodology All systems outlined in the present work have been studied using the hybrid Hartree-Fock- Density Functional Theory functional B3LYP [9]. Geometry optimizations and frequency calculations are throughout performed using a split valence basis set; normally 6-31G(d,p) or LANL2DZ (the latter in case the system contains transition metals) [10]. In order to obtain reliable energetics and spin properties, B3LYP calculations are subsequently performed on these optimized structures, using larger triple-zeta plus multiple polarization function basis sets, in some cases also including diffuse functions, i.e. 6-311(+)G(2df, p) or Lanl2DZ(+)(2d,2p). In addition, the energetics are corrected for zero point vibrational effects, as obtained using the smaller (geometry optimization) basis set. All calculations are performed using the Gaussian 94 and Gaussian 98 suites of programs [11,12]. When investigating enzymatic reactions using quantum chemical techniques, it is not possible to include the full enzyme. Instead, rather crude approximations need to be taken with respect to the size of the model employed. At present, such 'all-quantum' models are limited to the range of 30-50 atoms. In the present work, different criteria have been employed in order to determine the sizes of the individual model systems. In Table 1 we list the S-H bond dissociation energy for the amino acid cysteine, commonly employed by nature in radical enzyme reactions. As seen, extending the description of the cysteine side chain beyond a methyl group gives very minor improvements in the S-H bond strengths, compared to using the full amino acid. On the other hand, using the smallest possible model, HS-H, the BDE deviates by ca 6 kcal/mol from the amino acid and is obviously not a suitable model. Table 1. Effects of model system size on S-H bond strength in cysteine (kcal/mol), computed at the B3LYP/6-31 l+G(2d,2p) level. System HS-H CH3S-H CH3CH2S-H NH2(COOH)CHCH2S-H
S-H bond strength 87.1 81.9 81.9 81.7
Another criterion for choosing the models is based on the radical hyperfine structures. As mentioned above, the hyperfine properties can also be employed to single out protonation states and geometric conformers of the different
148 radicals observed, thus providing further insight into the mechanisms. Radical hyperfme couplings arise from interaction between the magnetic moments of the unpaired electron and the nuclei, and may be divided into two sets; the isotropic components (the trace of the diagonalized full 3x3 hyperfine interactions tensor), and the anisotropic 'remainder' of the diagonalized tensor. For detailed accounts on the theoretical evaluation of radical hyperfme tensors, the reader is, e.g., referred to references [ 13] and [14]. One example illustrating this particular aspect of model construction is the glycyl radical present in PFL and anaerobic RNR. In this case, the amino acid side chain consists of a hydrogen atom only. Removing one of the I-I~'s (i.e., the side chain) hence leads to the formation of a backbone-centred radical, in which the unpaired spin may delocalize out into the backbone rather than (as in most other cases) remaining localized on the side chain. In Table 2 we list some computed HFCCs for a series of models of the radical center in the glycyl radical. In this case it is clear that using a small model or the amino acid alone is not sufficient, but that we need to use the more extended form in order to accurately describe the distribution of the unpaired spin, the radical properties, and thus provide an accurate model for the systems reactivity. Table 2. Ca and Ho~ isotropic HFCCs (gauss) for a sequence of model systems, from methyl radical to extended glycyl, computed at the B3LYP/6-31 l+G(2d,2p) level. System CH3 CH2-CH3 NH2-CH2 CH2-CHO NH2-CH-CHO NH2-CH-CO-NH2 CHO-NH-CH-CHO CHO-NH-CH-CO-NH: CH3-CHO-NH-CH-CO-NH-CH3 Exp. E. Coli PFL [ 15] Exp. E. Coli anaer. RNR [ 16]
Aiso(1H~) -22.8 -21.9 -10.1 -17.6 - 11.5 - 14.2 - 13.8 - 16.4 - 16.4
&so(13C~) 25.9 29.2 44.4 19.3 7.3 11.5 11.7 15.5 15.3
(-)15 (-)14-15
16-21 :5-21
In addition to the above procedure to determine the smallest possible (yet accurate) models, we have also applied a few additional criteria. The computations are normally conducted in gas phase; i.e. all interactions with the surrounding medium (solvent and remaining enzyme) are neglected. In certain cases, however, a dielectric medium based on a polarized contiuum model (PCM) [17], is introduced. The protein backbone is for the most part excluded, assuming the reactions to essentially be localized to the side chains. However,
149 when two neighbouring amino acids are involved in the same reaction, the backbone connecting these is retained in order to mimic the restricted motion of the side chains relative to each other. We furthermore try to use charge neutral models as far as possible, based on convergence properties of charged species in vacuum. Apart from this, the reacting groups are always allowed to move freely so as to obtain the lowest energy conformers at all stages of the reactions. The exclusion of the remaining enzyme may be regarded as the most severe shortcoming of the present approach. However the thus obtained energy surfaces can still provide valuable insights into the catalytic mechanisms, and be used to discriminate between several alternative pathways. In addition, it has now been shown in numerous examples, that once an appropriate model of the active site is used, the geometric arrangements, kinetic data, and possible reaction mechanisms normally agree strikingly well with X-ray crystallography structures and various experimental data, despite the use of a small model system in gas phase. For each of the systems studied a large number of possible alternative pathways, intermediates, etc, are always investigated. The results shown in the present chapter illustrates the best possible sets of data within the approximations imposed, but should not be viewed as representing 'the full truth'.
3. Galactose Oxidase Galactose Oxidase (GO) from the filamemous wheat-root fungus Fusarium spp. is a mononuclear type 2 copper enzyme that catalyzes the two-electron oxidation of a large number of primary alcohols to their corresponding aldehydes, coupled with the reduction of dioxygen to hydrogen peroxide [1,18]:
RCH2OH
+
02 "') RCHO + H202
The protein is a single polypeptide with molecular mass of ca 68 kDa. To perform the two-electron chemistry, the enzyme utilizes, in addition to the copper center, a protein radical cofactor, which has been assigned to the Tyr272 residue. GO can exist in three distinct oxidation states: the highest state with Cu(II) and tyrosyl radical, the intermediate state with Cu(II) and tyrosine, and the lowest state with Cu(I) and tyrosine. The highest oxidation state is the catalytically active one. The protein radical couples antiferromagnetically with the copper ion, resulting in an EPR silent species. The X-ray crystal structure of the active form of GO has been determined at pH 4.5 and 7.0 [19]. The copper site was found to be close to the surface with essentially square-pyramidal coordination with Tyr495 in axial position and Tyr272, His581, His496 and a water or acetate to be replaced by substrate in equatorial positions (Figure 1). Interestingly, Tyr272 was found to be cross-
150
(~_
Tyr495
His496
Cy
Figure 1. Crystal structure of the active site of galactose oxidase. The substrate is believed to replace the exogenous water in the equatorial position. linked to a cysteine residue (Cys228) through a thioether bond at the orthoposition to the phenol OH. The Tyr-Cys moiety is, moreover, r~-stacked to a tryptophan residue (Trp290), which also controls the entry to the active site. Another interesting feature of the active site is the direct backbone link between the consecutive amino acids Tyr495 in axial position and His496 in equatorial position. The catalytic mechanism proposed for GO is shown in Figure 2 [20]. After the substrate binds to the equatorial copper position (occupied by water or acetate in the crystal structures), the first step is a proton transfer from the alcohol to the axial tyrosinate (Tyr495). Next, a hydrogen atom is transferred from the substrate to the modified tyrosyl radical. This step is known from isotope substitution experiments to be at least partially rate-limiting and probably the major rate-limiting step. The resulting substrate-derived ketyl radical is then oxidized through electron transfer to the copper center yielding Cu(I) and aldehyde product. Based on experiments with various inhibitors, Branchaud and co-workers have suggested that the two latter steps might occur simultaneously in a concerted manner [21]. Finally, Cu(I) and tyrosine are oxidized by molecular oxygen, regenerating Cu(II) and tyrosyl, and giving hydrogen peroxide as product. In accordance with the general guidelines given above for the choice of chemical models, the two histidines were modeled by imidazoles, the equatorial tyrosine was modeled by SH-substituted phenol, whereas the somewhat smaller, but fully adequate, vinyl alcohol served as model for the axial tyrosine. The
151
Tyr495
O
\
\/
~cu" His496
,.
Step 1 .,...._
Proton Transfer
O H
~Cu ~N ~ /
H ~ \
S~ O
HNR
R
H202
Reduction of Dioxygen
02
Step 2
4 \t ~N ~ ,
H
/Cu
H
H
~ S~
HydrogenAtom Transfer
\,H
Step 3 Electron Transfer
I
~-N/C~ / O
3 S~
H
H
Figure 2. Proposed reaction mechanism for galactose oxidase. simplest alcohol, methanol, was used as a substrate. The rest of the phenol ring of the axial tyrosine and the backbone link between it and the equatorial histidine (His496) were included as molecular mechanics atoms, using the IMOMM (Integrated Molecular Orbital / Molecular Mechanics) hybrid method [22]. This method uses quantum mechanical and molecular mechanics descriptions for different parts of the system, and it has proven to be successful in the quantification of steric effects in a number of organometallic applications [23]. As in the case of PFL (see below), a charge-neutral model was used for galactose oxidase. This model implies that one of the histidine ligands needs to be deprotonated in order to obtain the correct oxidation state of the copper atom. We start the discussion of the mechanism of galactose oxidase by noting the following. From experimental heats of formation one can calculate that the net reaction catalyzed by the enzyme, with methanol as substrate: CH3OH + 02 ~
CH20 + H202
is exothermic by 10.7 kcal/mol [24] (in our calculations 6.7 kcal/mol). The fact that the reaction is catalyzed by the enzyme does not change this total exothermicity. This sets some restrictions on the energetics involved in the catalyzed reaction. For instance, the proposed electron transfer from the ketyl
152 radical anion to the Cu(II) center cannot be very exothermic, since this would render the oxygen reduction steps rate-limiting, with a barrier higher than the 14 kcal/mol estimated for the H-atom transfer between substrate and Tyr-Cys system. In the calculations, it was found that the first step, the proton transfer from the substrate to Tyr495 (Step 1 in Figure 2), occurs with a very low barrier (less than 3 kcal/mol). The exothermicity was calculated to be 3.2 kcal/mol. One of the important results that come out from the calculations is that the radical site prior to the proton transfer (1 in Figure 2) is not the equatorial cysteine-substituted tyrosine residue, but rather the axial tyrosine (see spin distribution in Figure 3A). The axial position is the weakest one in the square pyramidal coordination of Cu(II), and thus the most natural place for the radical to be in. A series of model calculations with different ligands, ranging from simple OH and water to full phenols and imidazoles, was done, and it was found that all the calculations are consistent in having the radical axially in a Cu(II) complex (data not shown). Stack and co-workers [25] have synthesized model complexes that resemble both the spectroscopic characteristics and the catalytic activity of galactose oxidase. For these complexes, EXAFS and edge XAS experiments indicate that the radical is most likely located axially in the non-square planar coordination of the copper. Calculations by Rothlisberger and Carloni [26] on these model systems confirm this fact. We also recommend the chapter herein by that group, in which the full reaction mechanism of GO has been investigated using Car-Padnello MD methods.
r
S=-.5
c 5
=.11 =-.28
c
~8
19
1.45 ~
A
./-
~
,:s~-20 =.10
S=-.11
B
Figure 3. Optimized structures for galactose oxidase active site before (A) and after (B) the proton transfer from the substrate to the axial tyrosine.
153 In our calculations [2], the radical is after the proton transfer located at the equatorial tyrosine (Figure 3B), inferring that simultaneously with the proton transfer an electron is moved from the equatorial tyrosine to the axial one. Stack's model systems and GO active site behave, hence, very similarly, which shows that it is not the protein matrix that is keeping the radical in the equatorial position. The difference is rather that in the model complexes neither of the two phenolic oxygens is protonated, which makes the optimal radical Site to remain in the weak axial position. The substrate used in the model compounds is alcoholate (-OCH3), which, in contrast to the GO substrate, cannot give a proton to the axial phenol and thereby move the radical site to the equatorial tyrosine. The second step in the proposed mechanism of GO is a hydrogen atom transfer from the substrate to Tyr272 radical (Step 2 in Figure 2). On the basis of isotope substitution experiments, this step has been shown to be at least partially rate-limiting and probably the major rate-limiting step. Figure 4 shows the optimized structure for the transition state of this hydrogen atom transfer. The barrier was calculated to 13.6 kcal/mol. It is known that turnover rates of alcohols exhibit strong substituent effect [27]. For instance, galactose has a turnover rate of 800 s-~, while for ethanol it is only 0.02 s-~. Assuming the hydrogen atom to be fully rate-determining, a barrier of ca 14 kcal/mol can be estimated using the kinetic data for galactose as substrate [28]. For ethanol, the
1.37
s=-.31(
s=-.07
Figure 4. IMOMM(B3LYP/MM3) optimized geometry of the transition state for the proposed rate-limiting hydrogen atom transfer.
154
barrier can be estimated to be ca 6 kcal/mol higher than for galactose. Although the DFT-calculated barrier (for methanol) is somewhat lower than the experimental estimation, it does indeed provide strong support for the proposed mechanism. As seen in Figure 4,' the critical C-H bond has stretched to 1.36/k at the transition state, and the H-O bond about to form is 1.24 A. At the transition state, one spin is located at the copper (S=0.49) and the other is shared by both the tyrosine and the substrate (S=0.15 and S=0.49, respectively). The hydrogen atom carries 0.07 of the unpaired spin. Consistently with the tyrosyl radical being a n-radical, we note that the hydrogen atom is transferred perpendicularly to the phenol ring plane. Stretching the phenol O-H bond in the plane of the ring would instead lead to a high-energy t~-radical. The hydrogen atom transfer is proposed to result in a substrate derived ketyl radical (3), which then would be oxidized through electron transfer to the copper center, yielding Cu(I) and the aldehyde product (4). As mentioned above, these two steps have been proposed by Branchaud and co-workers to occur in a concerted manner [21 ].
1.3" S=-.52
Figure 5. IMOMM(B3LYP/MM3) optimized structure of the ketyl radical intermediate (structure 3 in Figure 2).
155
By moving from the transition state structure towards the product, we were able to localize the proposed radical intermediate in the IMOMM calculation (Figure 5). The energy of the intermediate is 4.9 kcal/mol down from the transition state, making the hydrogen atom transfer step endothermic by 8.7 kcal/mol. Without including the MM part, all attempts to localize the ketyl radical intermediate failed, indicating that this intermediate is very unstable with the barrier for its collapse to the closed shell Cu(I) and aldehyde product being very small. In practice, this radical intermediate is therefore probably impossible to detect. The ketyl radical intermediate (3) is hence unstable and will readily reduce the copper center, yielding Cu(I) and aldehyde (4). The electron transfer (ET) step was estimated to be exothermic by 36 kcal/mol, by Wachter and Branchaud [_28]. Due to the smallness of the model employed, the closed shell Cu(I) + aldehyde system (4) will, when optimized, have a largely distorted geometry. Although the backbone link between Tyr495 and His496 included in the IMOMM calculations reduces the distortion somewhat, it is clear that this species is overstabilized in the model calculations. It can only serve as an upper limit to the amount of energy that can be gained. The exothermicity is calculated to around be 14 kcal/mol relative to the ketyl radical intermediate. The protein matrix will of course prevent such a large distortion and we estimate the energy of the complex, with proper coordination before the distortion to be around 5 kcal/mol loweY than the energy of the ketyl radical intermediate. This estimation is based on the energy of the system in the first few steps of the geometry optimization, i.e. before the distortion becomes very large. Energy is instead gained through the binding and one-electron reduction of dioxygen. Assuming the aldehyde product is released at this stage, and that dioxygen occupies its coordination position, 02- is found to bind to copper by 20.6 kcal/mol more than the substrate alcoholate (-OCH3). The structure of this complex is shown in Figure 6. This energy is reasonable, because, as discussed above, large exothermicity in this step would render the reduction of 02 ratelimiting. The calculated potential energy curve for the steps discussed above is displayed in Figure 7. The role of the tyrosine-cysteine cross-link is of fundamental mechanistic interest. It has been suggested that this thioether bond is in part responsible for the 0.5-0.6 V lowering of the oxidation potential of this species compared to normal tyrosine [29]. The C228G mutant has, furthermore, been shown to have 10,000 times lower activity than the wild-type enzyme, and also migrates slower on gel electrophoresis [30]. The crystal structure of the mutant did not show much change in the main chain or the copper binding site due to the mutation, except, of course, for the missing thioether bond. It was proposed that
156 destabilization of the tyrosyl radical due to less electron delocalization caused this activity decrease.
,) N) .... .
S=.55
t
s1.38
OQ~
....
S=-.55
Figure 6. IMOMM(B3LYP/MM3) Optimized structure of 02- bound to GO active site. There are, however, many pieces of evidence that the cysteine link only causes small perturbations in the electronic structure and energetics of tyrosine. Electrochemical experiments by Whittaker et al [31 ] showed that the pKa of omethylthiocresol was only 0.7 pH units lower than for cresol (9.5 vs. 10.2). Babcock and co-workers have shown, based on EPR and ENDOR experiments on both apo-enzyme and model alkylthio-substituted phenoxyl radicals, that the sulfur cross-link only induces small perturbation in the spin distribution of the tyrosyl radical [32]. No big shift in the g-tensors between unsubstituted and methylthio-substimted radicals was observed. Since this kind of shift is expected when heavy elements carry some of the spin in organic radicals, the conclusion was that the sulfur center possesses only a small part of the unpaired spin. We have conducted ab initio multiconfigurational linear response g-value calculations of unsubstituted and sulfur-substituted phenoxyl radicals and shown that the shift in g-tensor is as small as 0.0008 in the gxx-Component (2.0087 vs. 2.0079 in t). The other components were virtually unchanged, thus confirming the experimental results [33]. By means of density functional calculations, it was also shown that the thioether bond has very small effects on the hyperfine couplings and spin distributions [34]. The odd-akernant spin pattern of the tyrosyl radical was
157 essentially unbroken, with the sulfur center only having ca 0.12 of the unpaired spin. The full spin density distributions, calculated with several different density functionals, are displayed in Table 3. We have also calculated the sulfur substituent effect on the O-H bond dissociation energy (BDE) of phenol [35]. The BDE of the sulfur-substituted phenol was found to be only 1.7 kcal/mol lower than the unsubstituted species. As for the effects of the cysteine cross-link on the catalytic mechanism, the calculations were re-done without including the sulfur linkage. Both the energetics and the geometrical structures were found to be almost identical. For example, the barrier for the critical hydrogen atom transfer step was calculated to be about 1 kcal/mol higher with the sulfur moiety included, i.e. the sulfur link actually makes this step proceed somewhat slower, but this result of course falls within the error margin of the methods used. Evidently, the thioether bond has very small electronic effect on the tyrosine. The role of the cross-link could be o:f structural nature, keeping things in place. Table 3. Mulliken spin population distributions for unsubstituted (Tyrosyl) and ethykhio-substituted (Tyrosyl-S) tyrosyl radicals calculated with various DFTfunctionals.
H
H
H
N
H
~ s
H
o
atom
C1 C2 C3 C4 C5 C6 O S
o
Tyrosyl B3LYP BLYP B3P86 PWP86 0.40 -0.16 0.31 -0.11 0.31 -0.16 0.43 .
0.35 -0.10 0.26 -0.05 0.26 -0.10 0.39 . .
H
0.41 -0.17 0.32 -0.11 0.32 -0.17 0.42 .
0.36 -0.10 0.26 -0.03 0.25 -0.09 0.37
Tyrosyl-S B3LYP BLYP BLYP PWP86 0.34 -0.16 0.28 -0.06 0.23 -0.11 0.35 0.11
0.28 -0.10 0.23 -0.00 0.18 -0.05 0.31 0.14
0.35 -0.17 0.28 -0.05 0.24 -0.11 0.35 0.12
0.28 -0.09 0.21 0.02 0.18 -0.03 0.28 0.15
To summarize this section, the theoretical calculations [2] strongly support the mechanism proposed for galactose oxidase. It was shown that the proton transfer step proposed to initiate the oxidation of the substrate is very fast and just slightly exothermic. The rate-limiting hydrogen atom transfer step has a calculated barrier of feasible 13.6 kcal/mol. The proposed short-lived ketyl radical intermediate has been localized, and it was argued that the subsequent
158
electron transfer fxom this to Cu(II) cannot be very exothermic. High exothermicity at that point would render the reduction of 02 rate-limiting. The radical site prior to the initiating proton transfer is, moreover, proposed to be located at the axial tyrosine (Tyr495), rather than at the equatorial thioether substituted Tyr272, as previously suggested. This would bring consistence between GO and model experiments by Stack, where there are strong indications that the radical resides axially. The cysteine cross-link, finally, was shown to have a very small effect on energetics and spin properties of the system.
hydrogen atom transfer 12
4 0
E
0
o --~
-4
-
10.4 e--transfer 5.5
. <3 .0.0/~ 1
!
2
-3.2
),, -8 t..-
w
> -.~
rr
3
-10.7
proton transfer
-12 .
4
-16 .
-14 ~Oe-bindin
~ J \
-20 -24
O2-reduction
-23.8
-28
Reaction Coordinate
Figure 7. Calculated, IMOMM(B3LYP/MM3), potential energy surface for the mechanism of galactose oxidase.
4. Pyruvate F o r m a t e - L y a s e
Pyruvate formate-lyase (PFL) catalyzes the reversible conversion of pyruvate and CoA into acetyl-CoA and formate [36,37]" O -O20-'~0H3
O d-
CoA--SH
-
-
CoA_S-'~CH3
+
HCO2
159 The enzyme is a homodimer composed of 85 kDa monomers and is essential for the anaerobic glucose metabolism in Escherichia coli and other bacteria. PFL exhibits tWO-Step ping-pong kinetics with acetylated enzyme intermediate. The catalytic power is high, with kcat = 770 s-~ for the forward direction and k c a t "- 260 s-1 for the backward direction. The active enzyme contains a stable organic radical, which has been assigned to the Gly734 residue [15]. This was the first example of a radical enzyme with the radical located at the protein mainchain. Glycyl radical has also been found in anaerobic ribonucleotide reductase [16,38]. The stability of the glycyl radical is usually explained by means of the so-called capto-dative effect [37]. This occurs when the radical center is located between an electron donor (the amino group) and an electron acceptor (the amide carbonyl). The combined effect of these two groups gives an enhanced radical resonance stability. In the calculations [3], protein-bound glycine is modeled by adding the p_eptide bond on each side of the glycine, CHO-NH-CHz-CO-NH2 (see Figure 8).
S=.ll
g
, S
5
~ 1.22 S=.ll Figure 8. Optimized geometries of the model used for protein-bound glycine and its C~-radical. Mulliken spin populations are also shown.
As seen from the figure, the spin is delocalized also to the backbone of the adjacent amino acids. The spin populations on the carbonyl oxygen and the nitrogen next to the radical carbon (0.11 and 0.06, respectively) confirm clearly the capto-dative hypothesis. However, there is spin also further out in the backbone chain: the oxygen and the carbon of next carbonyl has 0.11 and 0.04, respectively, and the nitrogen on the opposite side has 0.05. From these spin populations and also the bond distances displayed in Figure 8, additional resonance structures can be drawn for protein-bound glycyl. These are displayed in Figure 9. An extended model of glycine is hence important to in order to account for the resonances in the protein-bound radical. This effect was also seen in Table 2, illustrating the dependence of the radical HFCC' s on the system size.
160
O
H
H
O
H
H
N\R H
N\R
O
H
A
"O
H 0
D
R.~
O*
H ~+ H
B
H H
O
O
H 0
E
N\ R O C
H H
H
O
H H
H 0
F
Figure 9. Resonance structures present in protein-bound glycyl radical. A-C represent the so-called capto-dative effect, whereas D-F are resonances due to the backbone of neighbouring amino acids.
It has been established, by means of site-directed mutagenesis experiments, that three amino acid residues are essential for the overall catalysis, namely the glycyl radical (Gly734) and two consecutive cysteines at positions 418 and 419. Two mechanisms were proposed for PFL at the time when the theoretical study was performed. Although based on the same experimental information, these two mechanisms are quite different (Figures 10 and 11). In the mechanism proposed by Knappe and co-workers [36] (Figure 10), the protein radical is transferred from glycyl to Cys418 after pyruvate has added to the Cys419 building a thiohemiketal moiety. The thiyl radical of Cys418 then forms an adduct to the carboxyl of the thiohemiketal. Next step is an intramolecular hydrogen atom transfer, yielding alkoxy radical intermediate which then undergoes the homolytic C-C bond cleavage. Another intramolecular hydrogen atom transfer occurs in the Cys418-formate radical adduct, resulting in an oxy radical that dissociates to form free formate and Cys418 radical. The thiyl radical at Cys418 is, finally, quenched by Gly734, completing the first half reaction. The subsequent transfer of the acetyl group from Cys419 to CoA is proposed to use Cys418 as a nucleophilic relay. The second mechanism is due to Kozarich and co-workers [39] (Figure 11). The initial step here is the abstraction of a hydrogen atom from Cys419 by the glycyl radical, forming a transient thiyl radical. Addition of this thiyl radical to the keto group of the pyruvate results in the formation of a tetrahedral oxyradical intermediate. This intermediate collapses into an acetylated cysteine and
161
Gly" Cys41
Gly(H)
o II
4
/ t ~ ' s - ~ OH / co~
......
C y s 4 1 8 ~ H H3c/C~co2-
~oO~
~o-
~
~
CH3
--,.
o
CH3 o
OH
1L
/~
~
-~ ~:-~
Giy- Gly(H)
~~o OH
HCO2-
Figure 10. The reaction mechanism proposed by Knappe and co-workers [36]. a formyl radical, which is then reduced to formate by hydrogen atom abstraction from the glycine residue, regenerating hence the stable glycyl radical. A step of transesterification between C419 and C418 takes place before the reaction is completed by the CoA-dependent thioester exchange.
H2 Cys419 ~-~SH C~41~
~
|
I~ H3C.,,~3~COa.
02" ~
"-
sH 0
c
"COs
SH
~
3
~
s
CH3 HCO2"
Figure 11. The reaction mechanism proposed by Kozarich and co-workers [39].
162
As discussed above, in the calculations, a large model of the glycyl residue (CHO-NH-CH2-CO-NH2) proved needed, in order to correctly account for the resonances in the protein-bound glycyl. Cysteine was modeled by methylthiol, HSCH3, according to the discussion in the Methodology section above. Also in this study, we chose to work with a charge neutral model, i.e. the total charge of the species considered Was chosen to be zero. Pyruvate was accordingly modeled by pyruvic acid, and formate by formic acid. The first step of the catalytic cycle proposed by Kozarich is the creation of a transient thiyl radical at the Cys419 position. Assuming that the cysteines are in close spatial proximity to the glycyl radical, the barrier for the direct hydrogen transfer reaction between glycyl and cysteine was calculated to 9.9 kcal/mol. The Ca-H bond strength of the glycine model used is 79.3 kcal/mol and the S-H bond strength of the cysteine model is 81.9 kcal/mol. This makes the hydrogen atom transfer from cysteine to glycyl endothermic by 3.4 kcal/mol. The optimized structure for the transition state is displayed in Figure 12. The spin at the transition state is distributed mainly on the sulfur and the glycyl Ca centers (0.42 and 0.43, respectively). However, also the carbonyl oxygens have some spin, 0.05 - 0.07. S=.05 1.55 1.42
S=.42 1.84
S=.07
S=.07
Figure 12. Optimized transition state structure for the direct hydrogen atom transfer from cysteine to glycyl radical. The next step is the addition of the thiyl radical to the carbonyl carbon of pyruvate, yielding a tetrahedral oxy-radical intermediate. The calculated energy of this intermediate, relative to the free reactants, is +9.9 kcal/mol, and the barrier for its formation is calculated to 12.3 kcal/mol. The barrier for the dissociation of the radical intermediate into acetylated cysteine and formyl radical is calculated to be only 2.8 kcal/mol with an exothermicity of 3.9 kcal/mol. Taken together, the total reaction: Cys419o + pyruvate
Cys419-acetyl + formylo
163 is hence endothermic by 6.0 kcal/mol. These results show that the scenario proposed, involving the tetrahedral radical intermediate, is indeed energetically plausible. The structures and spin distributions of the two transition states (TS1 and TS2) and the tetrahedral intermediate are shown in Figure 13.
~ S=.15 1.52 ~,~ 1.66 ~1.19
1.52
S = . 2 ~
S=.49 .
A
4---
B
4
F
1.8Z f
S=-.04
.... ~
~
~S=.21
1 S=.31 ~""~" S=.06
C
Figure 13. Formation and collapse of the tetrahedral oxy-radical intermediate. Optimized structures for A) transition state of the thiyl radical addition to pyruvate (TS 1), B) tetrahedral oxy-radical intermediate, and C) transition state of the dissociation of formyl radical (TS2). The energies of these species are 12.3 kcal/mol, 9.9 kcal/mol, and 12.7 kcal/mol, respectively, relative the energy of (methylthiyl + pyruvic acid). The spin at TS1 is distributed on the sulfur atom (0.52), the carbonyl oxygen (0.24), and the carboxylic CO group (0.21). The S-C2 distance is 2.13 and the C2-Ccarboxyl bond is elongated to 1.66 /k (1.55 /k in pyruvate). The tetrahedral radical intermediate exhibits a somewhat different spin pattern. The spin is mainly concentrated on two centers, the sulfur atom (0.49) and the carbonyl oxygen (0.55). The carbonyl C2-O bond length is clearly of single bond nature (1.34 A). The S-C2 distance is 1.92 A and the C2-Ccarboxylbond is 1.56 A. At the second transition state (TS2), the spin density is moving over to the carboxylate group (0.58), although some of the radical character still remains on the sulfur (0.23) and carbonyl oxygen (0.20). The critical C2-Ccarboxyl bond is 1.98/k and the S-C2 distance is 1.88/k. Heterolytic addition of thiol to pyruvate was also considered. It was found that the direct thiol attack at the carbonyl carbon of pyruvate, yielding a nonradical tetrahedral intermediate has very high barrier (37.6 kcal/mol), although the intermediate has plausible energy; 1.6 kcal/mol over the free reactants (optimized structures are given in Figure 14). The possibility of a direct attack is thus ruled out. A much more plausible barrier is found when letting a carboxyl group to mediate in this reaction. The barrier dramatically drops to 12.0 kcal/mol (structure shown in Figure 14). This is clearly
164 competitive with the radical reaction, provided a carboxylate group-containing residue (Glu or Asp) is present to do the catalysis. No such group is, however, known to participate in the catalytic reaction of PFL.
1.42
A
B
C
Figure 14. Optimized structures for A) non-radical tetrahedral intermediate formed upon addition of cysteine to pyruvate, B) transition state for direct thiol attack on the carbonyl of pyruvate, and C) thiol attack mediated by a carboxyl group. The localized tetrahedral intermediate of the radical pathway will readily dissociate and release formyl radical. This reactive radical is proposed by Kozarich et al to abstract a hydrogen atom from Gly734, hence regenerating the stable enzyme radical at that site. The calculations show that this proposal is perfectly feasible, having a barrier of 4.9 kcal/mol and an exothermicity of 17.5 kcal/mol. However, bearing in mind that the subsequent step is an acetyl group transfer between the cysteines, we proposed that the formyl radical may instead abstract a hydrogen from Cys418 rather than from Gly734. The barrier for this is very low (1.1 kcal/mol), although the exothermicity (14.1 kcal/mol) is slightly lower than for the reaction with glycine described above. The thiyl radical thus created at Cys418 will now allow for a radical mechanism for the transfer of acetyl from Cys419 to Cys418. This transfer was originally proposed by Kozarich et al for two reasons. In acetylated PFL (the result of addition of pyruvate to activated PFL, in the absence of CoA), glycyl can still exchange its hydrogen, indicating that Cys418 is the site of acetylation, since Cys419 is known to be required for that exchange. From site-directed mutagenesis experiments it is also known that Cys418 is the primary residue participating in the thioester exchange with CoA, since thioester exchange to
165
CoA is observed for C419S mutant, but not for C418S. Note, however, that both the suggested mechanisms are consistent with the experimental observations. To move the acetyl group homolytically between the cysteines has a computed barrier of quite reasonable 11.6 kcal/mol. In Figure 15, the transition state structure for this reaction is presented. This thermoneutral occurs without the intermediacy of tetrahedral oxy radical intermediate.
S=.52 1.51~ ~ = - .04 222"
Figure 15. Optimized structure and spin population distribution of the transition state of the homolytic acetyl transfer between cysteines. A direct nucleophilic attack by the sulfur of non-radical cysteine on the carbonyl carbon of acetylated cysteine, yielding a tetrahedral intermediate, was also considered. As in the case of nucleophilic attack of cysteine on pyruvate, the intermediate lies relatively low in energy (+9.0 kcal/mol), but barrier is very high (41.2 kcal/mol). For the transfer of the acetyl from Cys418 to CoA we propose a similar homolytic radical mechanism as for the acetyl transfer between the two cysteines. For computational point of view, these two steps are identical, because the same model (methylthiol) was used for both cysteine and CoA. We propose, therefore, the following steps for the acetylation of CoA: .
2.
CoA-SH + Cys419- ~ CoA-So + Cys419 CoA-So + Cys418-acetyl ~ CoA-acetyl + Cys418.
The first step (1) is a simple hydrogen atom transfer, where the Cys419 radical abstracts a hydrogen from CoA-SH. This thermoneutral step has a calculated barrier of 2.4 kcal/mol. Step (2) is identical to the acetyl group transfer step between the cysteine residues (Figure 15) with a barrier of 11.6 kcal/mol. The enzyme can now either take another substrate, or regenerate the glycyl radical. The latter possibility could be accomplished through direct hydrogen atom transfer from Gly734 to Cys418 (provided proximity between
166 them), or using C419 as radical relay. These final steps are all calculated in previous steps and have low barriers. The full mechanism based on the calculations is summarized in Figure 16.
~ '-
Cys418~SH
u
CO 9 2-
Hac~C~coz.
~SH
N~SH
N~SH
oO~
g
~176176 CH2
/~CH2
H
~
CoA-S~
(..SH
O CoA-S~
~
CH2 0
/~CH2 CoA-SH
CHa
Figure 16. New reaction mechanism proposed for PFL [3]. During the preparation of this review, two papers were published describing the X-ray crystal structure of PFL. The first structure was solved by Goldman and co-workers but is lacking 125 C-terminal residues, including the essential Gly734 [40]. The second structure is complete and is due to Kabsch and co-workers [41]. Gly734 and Cys419 were found to be very close to each other, only 3.7 /k, confirming biochemical data and justifying the theoretical models. The other active site cysteine residue (Cys418) was found to be more buffed inside the protein, but in close proximity to Cys419, allowing for hydrogen transfer between these two residues. Kabsch and co-workers also crystallized PFL in complex with the substrate analogue oxamate, which differs from pyruvate in having an amino group instead of the methyl (Figure 17). This structure shows that Cys418 is perfectly located to attack the C-2 carbon of the substrate. Two arginine groups are, furthermore, thought to bind and stabilize the substrate. The structural results have some implications on the catalytic mechanism. The position of the substrate suggests that Cys418, and not Cys419, performs the radical attack on pyruvate. This would require two hydrogen atom transfers, first from Cys419 to the glycyl radical, and then from Cys418 to Cys419. This is reminiscent of the long-range hydrogen atom transfer in ribonucleotide reductase. There, the radical is transferred some 35 A from the tyrosine at the diiron site in R2 to the active site cysteine in R1 (see below). The function of Cys419 is, hence, just to mediate the radical transfer between
167 Gly734 and Cys418. This renders the acetyl transfer between the two cysteines unnecessary, because the acetyl is already at Cys418. Since the two cysteines were modeled in exactly the same way in the calculations presented above, all the results found for the addition of Cys419 radical to pyruvate also apply to Cys418.
G
\ ('~
- Cys418
oxam~
Arg176
Figure17. X-ray structure of the active site of PFL in complex with the substrate analogue oxamate. The glycyl radicals found in PFL and anaerobic RNR are remarkably stable. However, exposure of these anaerobic enzymes to oxygenated solutions are known to result in cleavage of the peptide backbone at the site of the glycyl radical [15,42]. Recently, Reddy et al reported on a detailed experimental investigation of the oxidative degradation of wild-type PFL, and samples where either or both of the two cysteines essential for catalysis were substituted by alanine [43]. Using mass spectrometry and EPR spectroscopy they were able to observe the well established products resulting from fragmentation at the Cc~-N bond of the glycyl radical [15], as well as products indicative of cleavage at the C1-Ccz bond. In addition, EPR data suggested the existence of a long-rived sulfinyl radical (R-SO-) at C419 in the wild-type and C418A mutant system, and a peroxyl radical in the C419A and C418AC419A mutants. Based on these observations, three alternative reaction mechanisms were suggested. All these, plus some additional alternatives, have recently been studied theoretically [4]. Based on a large set of reaction pathways, the most plausible alternative (a somewhat modified form of the main mechanism proposed by
168
Reddy et al) can be summarized as follows (Figure 18). Initially, 02 will add to the glycyl radical center in a barrier-free reaction (AE = -7.2 kcal/mol). The glycyl-peroxyl radical will then abstract the thiol hydrogen from C419 in an almost thermoneutral reaction with a barrier of 10.4 kcal/mol, followed by OH transfer from Gly-OOH to Cys-S-. This reaction has a similar barrier to that above, but is exothermic by ca 30 kcal/mol. The overall exothermicity from the initial starting point with G734~ 02 and C419, is 34.7 kcal/mol. The glycyl-alkoxy radical (G734-O-) can now easily abstract the C419SOH hydrogen to form the observed metastable sulfinyl radical and t~-hydroxyglycine. The hydroxy-glycine readily undergoes hydrolysis, and gives the observed, 'normal', fragmentation products resulting from cleavage of the Ct~-N bond. The H-abstraction step is essentially barrierless, and again exothermic by ca 30 kcal/mol. Relative Energy kcai/mol 20
TS
TS
o:o Gly* -20 - +02 +RSH
-40
-7.2 . Gly-OO +RSH
-4.7 TS
Gly-OOH +RS*
-32.6 -34.7
-35.5
Giy-O* +RSOH
-49.2
-60
-66.2
H2NCHO + OC*OH Gly-OH +RSO*
Figure 18. Main reaction pathways of oxidative degradation of PFL.
Alternatively, the glycyl-alkoxy radical may also undergo C-C bond cleavage. This reaction has a barrier of only 2 kcal/mol, and is exothermic by ca 14 kcal/mol, and will hence explain the observation of Ctx-C1 fragmentation products in the mass spectra. As noted above, very recent Xray-data show that C419 and G734 are in close contact at the active site, whereas C418 and G734 are further apart. This will explain the existence of stable peroxy radicals (Gly-
169 0 0 - ) in the C419A mutants, whereas such species are believed to be transiently observed in normal C419 type enzymes. As mentioned above, several optional reaction pathways were also investigated in this study [4], but none of these was able to explain all experimental observations, or were energetically feasible.
5. Ribonucleotide Reductase
Ribonucleotide reductases (RNR) constitute a large group of essential enzymes with a diverse array of primary as well as quaternary structures. Common for the enzymes is that they catalyze the rate-determining step in DNA biosynthesis, the reduction of ribonucleotides into deoxy-ribonucleotides (Figure 19) [44,45]. (P)PPO
Base
H~4* O1 ' ~ H H----~3'
/
H.-O
(P)PPO,~
RNR
O
Base
H
H
2' L.-H
\
O~H
ribonucleotide
H.-O
.
deoxyribonucleotide
F i g u r e 19. Net reaction of Ribonucleotide reductases
The RNRs are divided into four classes, depending on the cofactors utilized to catalyze the reaction [45]. Class I RNR, which is found in e.g. mammals and E.coli bacteria, employs a stable neutral tyrosyl radical coupled to a di-iron (Fe:O2) cluster [46]. Class II uses 5'-deoxy-5'-adenozyl-cobalamin [47] (the active form of vitamin B12 - see also the chapter by Smith, Wetmore and Radom). Class III is also found in E. coli, when grown under anaerobic conditions, and uses a neutral glycyl radical as cofactor, similarly to the previously described anaerobic PFL enzyme [48]. Class IV, finally contains what is again believed to be a tyrosyl radical, this time linked to a di-manganese cluster [49]. In addition, class I RNRs have been divided into subclasses la and lb, differing in e.g. their expression mechanism [50]. We will in this chapter consider aerobic Class I RNRs only. The systems have been the subject of extensive experimental work, including EPR spectroscopy, isotopic labelling, inhibitor mechanisms, mutagenesis and kinetics studies, and a large number of excellent reviews are available summarizing the present knowledge ([1] and references therein). Besides the actual catalytic machinery, large efforts have also been devoted to understanding the activation processes, i.e., the formation of the di-iron complex
170 and the generation of the stable tyrosyl radical. These latter aspects will not be considered here; instead the reader is, e.g., referred to the recent review by Stubbe and van der Donk [1 ].
Cys462 Cys225 [substrate]
Trp,
Cys439 '.
R1
Tyr730 '. Tyr731
,
35 A
\
S = 0.33
H.
" "O
'
Trp48
(Tyr356) .' .. .~Asp237
Aspz~7
." S = 0 . 0 0
R2
"" V " ~ \~ Tyr122" ... Glu115 W Asp84
H /
Ir~NN
S = 0.12
His241
H.._
His11'8 Glu238 W. ~Fe/O~Fe ~ His241
A
OI ~ll
i
Trp111
(Tyr1220*i ..
I/H20
H"uE;.Fe~OH ~/O S = 4.00 Asp~
B
Figure 20. A. Model of interaction between tyrosyl radical (Tyr122) at R2 subunit and active site residue Cys 439 at R1 subunit of class I RNR. W indicates ligated water molecules. B. Computational model of the R2 sub-system, used for modelling the initial stages of the radical transfer. The class I RNRs were discovered in the early 1950's by Reichart et al [46], and was the first enzymatic system that could be shown unequivocally to harbour a stable amino acid radical, by means of EPR spectroscopy [51]. The system is known to contain two loosely connected homodimers, with two R1 or two R2 subunits. The enzyme requires that the two homodimers are connected in order to function. The tyrosyl radical is located in one of the R2 subunits and is connected to the active site at the R1 subunit via a 35 /k long chain of hydrogen bonded amino acids (Figure 20). Substituting any of the amino acids along the pathway by a residue less prone to H-bonding and H-atom migration results in inactivation or significantly reduced turnover rates of the enzyme.
171 The R1 active site harbours five conserved residues, Cys225, Cys439, Cys462, Glu441 and Asn437. Experimental evidence suggests that as the substrate enters the active site pocket, the radical site is triggered to migrate from Tyr122 up to one of the cysteine residues (Cys439) at the active site. Once the radical character has entered the active site, the substrate is able to undergo radical catalyzed conversion from ribose to deoxyribose, including loss of water and formation of a disulfide bridge between residues C225 and C462, that hence serve as reducing equivalents for the nucleotide reduction. Upon completion of the catalysis, the thiol radical is regenerated at C439, and the radical character is transferred back to the Y122 residue buffed deep in the R2 subunit. Figure 21 displays the catalytic mechanism proposed by Stubbe, based on a large compilation of experimental data [1,44,52].
PPEL. 439
Base H
PPQ~
H
439
s,
Step 1 -
H-O
H
O"H O
I
I
E,,
C462/C225
439
H
SH
O
PPO...
Base
/
\
? / H
e
Step
2
Base H
H
SH
O"H I-t20
H
.I
o E ~ / ~ L" OH
I
S-
I
4,-]9
Base H
PPEL.
H
439
S
Step 5 9 0
H
PPO~
H Step 4
SH
~
HO
Base
439
s
s
,/~o-
H
SH
H
//
\
0
0
/3~.. o-
Base
\H
/
HO
I
3
Step
PPO~
SH
I
H
O
$
s
E,d
OH
r
,~
I
S
I
Figure 21. RNR class I catalytic scheme as proposed by Stubbe (exp.) [ 1]. In order to understand the radical transfer mechanism between Tyr122 and the active site, we first need to consider the protonation state of the tyrosyl radical. In Table 4 we list the EPR parameters ((~-protons) and unpaired spin density distributions of neutral vs charged tyrosyl radicals, and compare the data with results obtained for Y122 in wild type E. coli RNR. From the data listed, it is clear that the tyrosyl radical is neutral, which has important implications for the radical transfer- i.e., that this is not a case of pure electron transfer but
172 rather H-atom or alternatively coupled electron-proton transfer. This also agrees with theoretical studies implying that pure electron transfer between Y122 and C439 would be endothermic by as much as 40 kcal/mol, and hence most unlikely [53]. Table 4. Carbon and oxygen spin densities, and o~-proton HFCCs of neutral ethyl phenoxyl radical and ethyl phenoxyl radical cation. System E. Coli RNR Tyr122*
[541
CH3CH2C6H40*
CH3CH2C6H4OI-I+*
Center C1 C2,C6 C3,C5 C4 O C1 C2,C6 C3,C5 C4 O C1 C2,C6 C3,C5 C4 O
Spin 0.38 -0.08 0.25 -0.05 0.29 0.39 -0.12 0.28 -0.03 0.37 0.43 -0.05 0.16 0.21 0.20
aH Axx o~HAyr
-9.6
1.7 -7.0
o~HAzz 2.7 -2.8
0.5
1.6
2.4
-8.8
-6.6
-2.0
-0.9 -6.0
-0.5 -4.7
1.7 -0.5
The model employed to mimic the H-atom transfer pathway can be divided into two parts - the first including the Tyr - Fe - His - Asp - Trp system of the R2 subunit, and the second including two neigbouring tyrosines Y731 and Y730 and the final C439 residue of the R1 subunit (see Fig. 20). The connectivity between the R1 and R2 subunits is not fully established as yet, due to difficulties in isolating and crystallizing the complete enzyme without it falling apart. The most likely route is via Trp48 to Tyr356, which is located close to the surface of R2, and from there onwards via Tyr730 in the R1 unit. The initial stage involves radical transfer from water ligated to the Fe(III,III) cluster over to Y122, as displayed in Figure 20. Initially, all spin (4.00) is located on the Fe(III) iron included in the model. Upon H-atom transfer from water to tyrosyl a spin of 4.04 is found on Fe(III) (Figure 22), indicating that we preserve the Fe(III,HI) cluster, rather than passing via the mixed valence Fe(IV,III) cluster. Also notable is that there is an immediate build-up of charge and spin on the Trp48 residue (spin goes from 0.33 to 1.07, charge increases from +0.20 to +0.73), whereas the charge on the water/OH ligand that has donated its hydrogen to Y122 goes from +0.16 to -0.36. This is indicative of electron transfer between W48 and water caused by the initial H-atom transfer process. The situation is hence very similar to that seen in cytochrome C peroxidase, in which electron transfer is seen from a Trp residue, via an Asp-His
173 sequence, to a heme-bonded iron upon cleavage of 02. The formed Trp radical cation hence retains its proton, albeit it in the equilibrium structure has moved closer to the hydrogen bonding oxygen of Asp237.
(*OTyr356) H \
S = 1.07
O i i
H... o /jj~/AsP237 o
."
H
H._ (Tyr,22OH).-
.'
.,~N) S = 0.11
AsPs4
,"
,,~AsP237 S
=
0.00
S = 0.09
His241
I/H20 ~O S = 4.04
A
" "O
s = 0.0o
N/
His241~/[~ N/)
H.
S = 0.00
H.
I/H20 %F%o.
(TYq22OH)'"
~,~"' S = 4.02 AsPa4
B
Figure 22. A. Optimized intermediate in R2 radical transfer chain, after H-transfer to Tyr122. Spin is immediately localized to Trp48. B. Final step in R2 subunit, obtained after hydrogenation of Asp237 (radical transfer onwards to R1 subunit) [5]. The energetics of these initial stages show an essentially thermoneutral process, i.e., the energy of formation of the Trp radical cation and rupture of the O-H bond of the ligated water is almost identical to the O-H bond strength of tyrosine, 86.5 kcal/mol. Dielectric effects are shown to beimportant in this step, due to the large charge transfer, and leads to an increased O-H bond strength of the ligated water by as much as 9.6 kcal/mol, thereby bringing it closer to the OH bond strength of tyrosine. Once the Trp48 radical cation is formed, the radical character should be transferred on via Tyr356 to the R1 unit. This is modelled by hydrogenating Asp237 (since Trp48 is still protonated), Figure 22B, and again the binding energy is almost identical to that of tyrosine. The entire radical transfer within the R2 subunit is hence essentially thermoneutral.
174
3.251
~,O
1.18e
I H.
1.215 O Figure 23. Transition state structure for H-atom transfer between Y731 and Y730 (bond lengths in ~gstr6m). Note that the two phenolic rings are ~-stacked. The second step involves radical transfer within the R1 unit, and could be shown to involve pure H-atom transfer [5]. In these calculations, the protein backbone joining Y730 and Y731 had a pronounced effect in that it keeps the residues in a geometry resembling that of the transition state (Figure 23). The barrier for H-atom transfer - is only 4.9 kcal/mol, and the dielectric effects negligible. Allowing the tyrosines to move freely (i.e. removing the backbone) instead raises the barrier to 9.5 kcal/mol due to the more stable structure of the reactants/products. The TS looks highly similar to that observed in the full calculations. The H-atom transfer nature of this step (rather than electron + proton transfer) is clearly manifested in the geometry, in that the H-atom bends ca 50 ~ out of the plane of the tings, and that it throughout retains its electron. Relative
10 - kcal/mol
Energy TS
m
_
TS
Cys-SH / - Tyr-OH / Tyr-O* / 0.0
+4.9
+8.1
Tyr-O* Tyr-OH
Cys-S* Tyr-OH Tyr-OH
0.0
+0.4
Cys-SH
Figure 24. Energetics for R1 H-atom transfer steps (Y730-Y731-C439). All energies are ZPE corrected and include dielectric effects. The final step of the radical transfer is H-atom migration between tyrosyl radical Y730" and C439. This has a barrier of ca 8 kcal/mol, and is only slightly endothermic (0.4 kcal/mol). The overall radical transfer between Y122(R2) and
175
C439(R1) is hence characterised by thermoneutrality and very low barriers requisites for a fast "radical shuttle" up to the active site when needed, as well as back into the protected environment once the substrate reaction is over (Figure 24). The substrate mechanism has been investigated theoretically in detail by Siegbahn [6], based on the mechanism previously suggested by Stubbe (Fig. 25). Until recently, no intermediates had been isolated within the catalytic cycle, and the proposed mechanism was based primarily on mutagenesis experiments, isotope labelling and inhibitor studies. In a mutagenesis study by Sj/Sberg and coworkers, the Glu441 residue at the active site was substituted for Ala, Asp or Gln, whereby the explicit dependence of the Glu441 residue for catalytic turnover was revealed [55]. In addition, they could record the EPR spectrum of a new transient radical intermediate, most likely localized to the 3' position of ribose. From this work it hence became clear that not only the three cysteine residues were required, but also Glu441 providing H-bonding interaction and serving as "proton shuttle" (see below). In addition, the radical-based catalytic mechanism could definitely be established. Based on a large set of model calculations, Siegbahn revised and extend the Stubbe mechanism to invoke both the Glu441 residue as well as the conserved Asn437, hydrogen bonded to Glu441, in the catalytic machinery. The initial reaction in this 6-step mechanism is the abstraction of the C3'-H of ribose by the Cys439 radical formed through the radical transfer mechanism (Figure 25). The subsequent step involves simultaneous loss of O3'-H to Glu441, formation of a O3'-C3' double bond, transfer of the radical site to C2' of ribose, loss of the C2'-OH group and formation of water from the C2' OH group and the carboxylic hydrogen already bound to Glu441. Stabilizing H-bonding interactions to Asn437 is crucial for this complex sequence to proceed with low barrier. Next, the two cysteine residues C225 and C462 come into play. C225 looses its hydrogen to the C2' radical site of the substrate, C462 donates its hydrogen to the C225 thiol radical which forms a complex to the C3' position, and the 0 2 ' hydrogen is returned from Glu441. According to the computed energetics, this step is rate limiting. In step 5 C225 is released from the sugar to form a disulfide bridge between C225 and C462, and the radical character is transferred back to the C3' position. The final step involves back-transfer of the H-atom to C3', initially taken by Cys439. We have now formed the deoxyribose substrate, water and a disulfide bridge. The radical character at C439 is transferred back to Tyr122 of the R2 subunit while the substrate leaves, and the disulfide bridge is reduced through a sequence of coupled enzymatic reactions to regenerate the active site for another turnover. The mechanism is displayed in Figure 25, and in Figure 26 we show the overall energetics of the mechanism. It should be said, though, that the model has obtained some criticism, albeit it does
176
seem to fulfil essentially all experimental observations to date - including the abovementioned mutagenesis data for Glu441. PPO........~
?
~
Base
H
S ,.,
PP*'~
ase
H
H"-.-,,)~~H ./ \
SH '
o~
H
,
'
SH Step 1
I
I
H
C~
"O
SH
~
C4~
.~.~~H
o,
o..
H
,
Lo,?: ',
i
SH
SH
C226
C41~
I
H.
"O
I
Ste~
LH--- N
PPO-...~
Base
C
Step 2
PPO-......,/o
C. Ht,
H
siH
x.
// 0 O
SH
9
~"H
H /
~o:
,,0., H H 9
SH
I
Step3
oH
I
C~
Base
"/--H
//
(--H
o
C4e= O
9
"o
..H
. ,
o
co
+ + --
w
I ---..- I
, 9
O" E44
+SH
I
C,~
9
~
~C22s
O." H - "O-,H.
~.
9S
"H-.--a,, 6
c.I
1~
O
?'~ Se
1~
H
O O
-"
h
H
O- H - -
S
6
c=,
"
I
' 'S
I
c,=
N~,rr
Base H
H
H--O
E~
Step 6=
H
:N---U,
N
PPO~
Step 6
o'" h
N4wt
Base
,.~t__~ H H--O
.S
H"
N
H
I
C=2s
ster~ 4
"O
PPO"~
Base
Step 4 _ S" S" Czm C4=
H"
O.,
SH
H-- N I H
/ H
PPO.....~
"S
o-H-" S
S
;
c,=
',
I
c=,
I
N,~
Figure 25. RNR class I reaction scheme as proposed by Siegbahn (comp.) [6].
177
Relative
kcal/mol 10
step 1 -
-10
-20
Energy
/+712/+2. 2 step 2 0 . 0 0 J +2.5\ HO
OH
step3
step 4,, ~
-
-
O
/
~-10.9 /
/
o
H
RS
\-10.7 o
HO
" H
/ DNA
Figure 26. Overall energetics for catalytic mechanismof Class I RNR's [6]. The key intermediates of the sugar moiety are shown - labelling of different steps refers to Figure 25.
6. Concluding Remarks We have in the present chapter shown results from theoretical model system studies of the catalytic reaction mechanisms of three radical enzymes Galatose oxidase, Pyruvate formate-lyase and Ribonucleotide reductase. It is concluded that small models of the key parts of the active sites in combination with the DFT hybrid functional B3LYP and large basis sets provides a good description of the catalytic machineries, with low barriers for the rate determining steps and moderate overall exothermicity. The models employed are fin'thermore able to reproduce all the observed features in terms of spin distributions and reactive intermediates. For the Cu-containing system Galactose oxidase, we conclude that the unpaired spins initially are located on the Cu atom and on the axial tyrosine, whereas the equatorial, cysteine-linked tyrosine obtains the unpaired spin upon proton transfer from substrate to axial Tyr. The computed barrier for the rate determining step (H-atom transfer from substrate to Tyr-S-moiety) is in excellent agreement with experimental data, whereas the charge transfer between Cu(II) and substrate ketyl anion is less exothermic than estimated experimentally. A question is however raised regarding the experimental estimates, based on the computed data and the overall net exothermicity of the substrate reaction. In Pyruvate formate-lyase, the active site contains a stable glycyl radical and two catalytically active cysteines. Two different proposed reaction mechanisms were tested, and it was concluded that the mechanism as suggested
178 by Kozarich et al is by far the energetically most favourable one, albeit a minor modification in terms of radical mediated transesterification is proposed. Again, the reaction is characterized by a rather low barrier for the rate determining step. In addition, the mechanism for oxidative degradation of this anaerobic enzyme is outlined, based on a large set of model calculations. In Ribonucleotide reductase, f'mally, the radical transfer mechanism between the stable tyrosyl radical in the R2 subunit and the cysteine residue at the R1 active site is outlined, and shown to primarily invoke a neutral H-atom transfer pathway, with very low barriers and thermoneutrality. In addition, the substrate mechanism is outlined, based again on a model slightly modified compared with the original experimental proposals. In addition, several other radical enzymes have been investigated theoretically by us and others, such as DNA photolyase, Cu amine oxidase and prostaglandine H synthase, but we have found it beyond the scope of the present chapter to include all of these.
Acknowledgements The following people are gratefully acknowledged for their active roles in the studies of the above systems" M. Pavlov/Wirstam, Dr. J. Gauld, and Profs G.T. Babcock, F. Maseras, P.E.M. Siegbahn and A. Gr~islund. The Swedish Natural Sciences Research Council (NFR) is gratefully acknowledged for financial support. We also acknowledge the supercomputing centers at the Royal Institute of Technology in Stockholm (PDC) and at Link6ping University (NSC) for generous grants of computer time and Profs. J. Knappe and W. Kabsch for providing us with the coordinates of PFL prior to release on PDB.
References [1] Stubbe, J.-A.; van der Donk, W.A., Chem. Rev. 98 (1998) 705. [2] Himo, F.; Eriksson, L.A.; Maseras, F.; Siegbahn, P.E.M., J. Am. Chem. Soc., in press, (2000). [3] Himo, F.; Eriksson, L.A., J. Am. Chem. Soc. 120 (1998) 11449. [4] Gauld, J.W.; Eriksson, L.A., J. Am. Chem. Soc. 122 (2000) 2035. [5] Siegbahn, P.E.M.; Eriksson, L.A.; Himo, F.; Pavlov, M., J. Phys. Chem. B 102 (1988) 10622. [6] Siegbahn, P.E.M., J. Am. Chem. Soc. 120 (1998) 8417. [7] Pederson, J.Z.; Finazzi-Agrb, FEBS Letters 325 (1993) 53. [8] Kozarich, J.W.; Brush, E.J., in The Enzymes, Sigman, D.S., Ed., Academic Press: San Diego 1992, Vol. XX, 317. [9] Becke, A.D., J. Chem. Phys. 98 (1993) 1372; idem ibid 5648; Lee, C.; Yang, W.; Parr, R.G., Phys. Rev. B37 (1988) 785; Stevens, P.J.; Devlin, F.J.; Chablowski, C.F.; Frisch, M.J., J. Phys. Chem. 98 (1994) 11623. [10] Krishnan, R.; Binkley, J.S.; Pople, J.A.J. Chem. Phys. 72 (1980) 650; (b) McLean, A.D.; Chandler, G.S.J. Chem. Phys. 72 (1980) 5639; (c) Frisch, M.J.;
179 Binkley, J.S.; Pople, J.A.J. Chem. Phys. 80 (1984) 3265; Dunning, T.H., Jr; Hay, P.J., in Modern Theoretical Chemistry, Schaefer III, H.F., Ed., Plenum, New York, Vol.3, p.1; Hay, P.J., Wadt, W.R., J. Chem. Phys. 82 (1985) 270, 284, 299. [11] Gaussian 94 (Revision E.2), M.J. Frisch, G.W. Trucks, H.B. Schlegel, P.M.W. Gill, B.G. Johnson, M.A. Robb, J.R. Cheeseman, T.A. Keith, G.A. Peterson, J.A. Montgomery, K. Raghavachari, M.A. A1-Laham, V.G. Zakrzewske, J.V. Ortiz, J.B. Foresman, J. Cioslowski, B.B. Stefanov, A. Nanayakkara, M. Challacombe, C.Y. Peng, P.Y. Ayala, W. Chen, M.W. Wong, J.L. Andres, E.S. Replogle, R. Gomperts, R.L. Martin, D.J. Fox, J.S. Binkley, D.J. Defrees, J. Baker, J.P. Stewart, M. Head-Gordon, C. Gonzalez and J.A. Pople, Gaussian Inc. Pittsburgh, PA, 1995. [12] Gaussian 98 (Revision A.7) Frisch, M. J.; Trucks, G. W.; Schlegel, H. B.; Scuseria, G. E.; Robb, M. A.; Cheeseman, J. R.; Zakrzewski, V. G.; Montgomery, J. A., Jr.; Stratmann, R. E.; Burant, J. C.; Dapprich, S.; Millam, J. M.; Daniels, A. D.; Kudin, K. N.; Strain, M. C.; Farkas, O.; Tomasi, J.; Barone, V.; Cossi, M.; Cammi, R.; Mennucci, B.; Pomelli, C.; Adamo, C.; Clifford, S.; Ochterski, J.; Petersson, G. A.; Ayala, P. Y.; Cui, Q.; Morokuma, K.; Malick, D. K.; Rabuck, A. D.; Raghavachari, K.; Foresman, J. B.; Cioslowski, J.; Ortiz, J. V.; Stefanov, B. B.; Liu, G.; Liashenko, A.; Piskorz, P.; Komaromi, I.; Gomperts, R.; Martin, R. L.; Fox, D. J.; Keith, T.; A1-Laham, M. A.; Peng, C. Y.; Nanayakkara, A.; Gonzalez, C.; Challacombe, M.; Gill, P. M. W.; Johnson, G.; Chen, W.; Wong, M. W.; Andres, J. L.; Gonzalez, C.; Head-Gordon, M.; Replogle, E. A.; Pople, J. A.,Gaussian, Inc., Pittsburgh PA, 1998. [13] Malkin, V.G.; Malkina, O.L.; Eriksson, L.A.; Salahub, D.R., In Modern Density Functional Theory: A Tool for Chemistry, Seminario, J.M.; Politzer, P.; Eds; Elsevier: Amsterdam, 1995, pp 273-346. [14] Barone, V., in Recent Advances in Density Functional Methods, Vol.1, Chong, D.P.; Ed, World Scientific: Singapore, 1995, pp287-334. [15] Wagner, A.F.V.; Frey, M.; Neugebauer, F.A.; Sch~tfer, W.; Knappe, J., Proc. Natl. Acad. Sci. USA 89 (1992) 996. [16] Sun, X.; Ollagnier, S.; Schmidt, P.P.; Atta, M.; Mulliez, E.; Lepape, L.; Eliasson, R.; Gr~islund, A.; Fontecave, M.; Reichard, P.; Sj6berg, B.-M., J. Biol. Chem. 269 (1994) 27815. [17] Miertus, S., Scrocco, E.; Tomasi, J., Chem. Phys. 55 (1981) 117; Barone, V.; Cossi, M.; Tomasi, J., J. Comp. Chem. 19 (1998) 404; Cances, M.T.; Mennucci, V.; Tomasi, J., J. Chem. Phys. 1077(1997) 3032. [18] Klinman, J.P., Chem. Rev. 96 (1996) 2541; Whittaker, J.W., In Metal Ions in Biological Systems, Vol. 30 Metalloenzymes Involving Amino Acid-Residue and Related Radicals, Sigel, H. and Sigel A, eds..; Marcel Dekker, Inc., New York (1994) p. 315. [19] Ito, N.; Phillips, S.E.V.; Stevens, C.; Ogel, Z.B.; McPherson, M.J.; Keen, J.N.; Yadav, K.D.S., Knowles, P.F., Nature 350 (1991) 87.
180 [29] Whittaker, M.M.; Whittaker, J.W., J. Biol. Chem. 263 (1988) 6074; Branchaud, B.P.; Montague-Smith, M.P.; Kosman, D.J.; McLaren, F.R., J. Am. Chem. Soc.115 (1993) 798; Whittaker, M.M.; Whittaker, J.W., Biophys. J. 64 (1993) 762. [21] Wachter, R.M.; Branchaud, B.P., J. Am. Chem. Soc. 118 (1996) 2782. Wachter, R.M.; Branchaud, B.P., Biochem. 35 (1996) 14425. Wachter, R.M., Montague-Smith, M.P.; Branchaud, B.P., J. Am. Chem. Soc. 119 (1997) 7743. [22] Maseras, F.; Morokuma K., J. Comp. Chem. 16 (1995) 1170. [23] Maseras, F., Top. Organomet. Chem. 4 (1999)165. Ujaque, G.; Maseras, F.; Lledos, A., J. Am. Chem. Soc. 121 (1999) 1317. Maseras, F.; Eisenstein, O., New J. Chem. 22 (1998) 5. [24] Curtiss, L.A.; Krishnan, R.; Trucks, G.W.; Pople, J.A., J. Chem. Phys. 94 (1991) 7221. [25] Wang, Y.; Stack, T.D.P., J. Am. Chem. Soc. 118 (1996) 13097; Wang, Y.; DuBois, J.L.; Hedman, B.; Hodgson, K.O.; Stack, T.D.P., Science 279 (1998) 537. [26] Rothlisberger, U.; Carloni, P., Int. J. Quant. Chem. 73 (1999) 209. [27] Wachter, R.M.; Branchaud, B.P., Biochem. 35 (1996) 14425. [28] Wachter, R.M.; Branchaud, B.P., Biochim. Biophys. Acta. 43 (1998) 1384. [29] Itoh, S.; Hirano, K.; Furuta, A.; Komatsu, M.; Ohshiro, Y.; Ishida, A.; Takamuku, S.; Kohzuma, T.; Nakamura, N.; Suzuki, S., Chem. Lett. (1993) 2099. [30] Baron, A.J.; Stevens, C.; Wilmot, C.M.; Knowles, P.F.; Phillips, S.E.V.; McPherson, M.J., Biochem. Soc. Trans. 21 (1993) 319S. McPherson, M.J.; Stevens, C.; Baron, A.J.; Ogel, Z.B.; Seneviratne, K.; Wilmot, C.M.; Ito, N.; Brocklebank, I.; Phillips, S.E.V.; Knowles, P.F., Biochem. Soc. Trans. 21 (1993) 752. Baron, A.J.; Stevens, C.; Wilmot, C.M.; Seneviratne, K.D.; Blakeley, V.; Dooley, D.M.; Phillips, S.E.V.; Knowles, P.F.; McPherson, M.J., J. Biol. Chem. 269 (1994) 25095. [31] Whittaker, M.M.; Chuang, Y.-Y.; Whittaker, J.W., J. Am. Chem. Soc. 115 (1993) 10029. [32] Babcock, G.T.; E1-Deeb, M.K.; Sandusky, P.O.; Whittaker, M.M.; Whittaker, J.W., J. Am. Chem. Soc. 114 (1992) 3727. [33] Engstr6m, M.; Himo, F.;/kgren, H., Chem. Phys. Lett. 319 (2000) 191. [34] Himo, F.; Babcock, G.T.; Eriksson, L.A., Chem. Phys. Lett. 313 (1999) 374. Wise, E.W.; Pate, J.B.; Wheeler, R.A., J. Phys. Chem. B 103 (1999) 4772. [35] Himo, F.; Eriksson, L.A.; Blomberg, M.R.A.; Siegbahn, P.E.M., Int. J. Quant. Chem. 76 (2000)714. [36] Knappe, J.; Wagner, A.F.V., Methods in Enzymology 258 (1995) 343. [37] Wong, K.K.; Kozarich, J.W., in Metal Ions in Biological Systems, Vol. 30 Metalloenzymes Involving Amino Acid-Residue and Related Radicals, ed. Sigel, H. and Sigel A.; Marcel Dekker, Inc. (1994) 279.
181 [38] Young, P.; Andersson, J.; Sahlin, M.; Sj6berg, B.-M. J. Biol. Chem. 271 (1996) 20770. [39] Brush, E.J.; Lipsett, K.A.; Kozarich, J.W. Biochemistry 27 (1988) 2217. Parast, C.V.; Wong, K.K.; Lewisch, S.A.; Kozarich, J.W.; Peisach, J.; Magliozzo, R.S. Biochemistry 34, (1995) 2393. [40] Lepp~tnen, V.-M.; Merckel, M.C.; Ollis, D.L.; Wong, K.K.; Kozarich, J.W.; Goldman, A. Structure 7 (1999) 733. [41] Becker, A.; Fritz-Wolf, K.; Kabsch, W.; Knappe, J.; Schultz, S.; Wagner, A.F.V., Nature Struct. Biol. 6 (1999) 969. [42] Yu, D.; Rauk, A.; Armstrong, D.A., J. Am. Chem. Soc. 117 (1995) 1789. [43] Reddy, S.G.; Wong, K.K.; Parast, C.V.; Peisach, J.; Maglozzio, R.S.; Kozarich, J.W., Biochemistry 37 (1998) 558. [44] Efiksson, S.; Sj6berg, B.-M., in Allosteric Enzymes, Herv6, G., Ed; CRC, Boca Raton (1989)p.189. Stubbe, J.,Adv. Enzymol. Relat. Areas Mol. Biol. 63 (1990) 349. Stubbe, J.; van der Donk, W.A., Chem. Biol. 2 (1995) 793. Sj6berg, B.-M., in Nucleic Acids and Molceular Biology, Eckstein, F.; Lilley, D., Eds.; Springer, Berlin (1995), Vol. 9, p.192. [45] Reichard, P., Science 260 (1993) 1773. [46] Hammersten, E.; Reichard, P.; Saluste, E., J. Biol. Chem. 183 (1950) 105. Reichard, P.; Estborn, B., J. Biol. Chem. 188 (1951) 839. [47] Blakley, R.L.; Barker, H.A., Biochem. Biophys. Res. Commun. 16 (1964) 391. Beck, W.S.; Hardy, J., Proc. Natl. Acad. Sci. USA 54 (1965) 286. [48] Barlow, T., Biochem. Biophys. Res. Commun. 155 (1988) 747. Fontecave, M.; Eliasson, R.; Reichard, P. Proc. Natl. Acad. Sci. USA 86 (1989) 2147. [49] Schimpff-Weiland, G.; Follman, H.; Auling, G., Biochem. Biophys. Res. Commun. 102 (1981) 1276. Willing, A.; Follmann, H.; Auling, G., Eur. J. Biochem. 179 (1988) 603. Griepenburg, U.; Lassmann, G.; Auling, G., Free Rad. Res. 26 (1996) 473. [50] Jordan, A.; Aragalli, E.; Gilbert, I.; BarbC J., Mol. Microbiol. 19 (1996) 777. [51] Ehrenberg, A.; Reichard, P., J. Biol. Chem. 247 (1972) 3485. Sj6berg, B.M.; Reichard, P.; Gr~islund, A.; Ehrenberg, A. J. Biol. Chem. 253 (1978) 6863. [52] See e.g. Licht S.; Stubbe, J., in Comprehensive Natural Products Chemistry, Poulter, C.D., Licht, S.; Stubbe, J., Eds., Elsevier, New York, 1998. [53] Siegbahn, P.E.M.; Blomberg, M.R.A.; Pavlov, M., Chem. Phys. Letters 292 (1998) 421. [54] Hoganson, C.W.; Sahlin, M.; Sj6berg, B.-M.; Babcock, G.T., J. Am. Chem. Soc. 118 (1996) 4672. Bender, C.J., Sahlin, M.; Babcock, G.T.; Barry, B.A.; Chandrsekhar, T.K.; Salowe, S.P.; Stubbe, J.; Lindstr6m, B.; Ehrenberg, A.; Sj6berg, B.-M., J. Am. Chem. Soc. 111 (1989)8076. [55] Persson, A.L.; Eriksson, M.; Katterle, B.; P/3tsch, S.; Sahlin, M.; Sj6berg, B.-M., J. Biol. Chem. 272 (1997) 31533.
This Page Intentionally Left Blank
L.A. Eriksson (Editor)
Theoretical Biochemistry- Processes and Properties of Biological Systems Theoretical and Computational Chemistry, Vol. 9 9 2001 Elsevier Science B.V. All rights reserved
183
Chapter 5
Theoretical Studies of Coenzyme Blz-Dependent CarbonSkeleton Rearrangements David M. Smith, Stacey D. Wetmore and Leo Radom Research School of Chemistry, Australian National University, Canberra, ACT 0200, Australia
1. INTRODUCTION Vitamin B 12 (cyanocobalamin) [1] was firs.t isolated more than 50 years ago [2] when it was found to be associated with the prevention of pernicious anaemia in humans [3]. Later, Barker discovered the principal, naturally-occurring form of vitamin B 12 to be 5'-deoxyadenosylcobalalnin [4]. This compound has since become known as coenzyme B12, reflecting the fact that it is required by a number of enzymes in order for them to properly perform their biological function. Each enzyme-coenzyme B 12 partnership has been found to facilitate a rearrangement in which a substrate hydrogen atom and functional group (X) on adjacent carbon atoms apparently change places [5, 6]" X
H
R I ~ R 4 R2 R3
H ~
X
F : { 1 . ~ ' ~ R4 R2 R3
(1)
An abundance of B 12 literature is concerned with identifying the mechanism by which this remarkable compound facilitates otherwise difficult reactions. Although it is now generally accepted that B 12-dependent rearrangements occur through pathways involving free radicals, relatively line is known about the nature of the proposed intermediates. Thus, the major challenge remaining in the B 12 field is the ongoing research into the intricate mechanisms of its metabolic functions. The aim of the present chapter is to describe our investigations of one particular class of rearrangements catalyzed by B 12, the so-called carbon-skeleton mutases, using high-level ab inifio molecular orbital theory. Given the difficulty of studying highly reactive species (radicals) experimentally, a computational approach is an ideal way to tackle this problem. Using the results from our calculations on model systems, we are able make a significant addition to the existing B12 knowledge, by comparing and contrasting various proposed mechanisms for the rearrangements, a task that is not easy to perform experimentally. In some of the work presented in this article, we attempt to step beyond this role, and suggest specific ways in which the proteins may interact with the substrates to accelerate the rearrangements. Before discussing the systems that form the focus of the current chapter, we will outline relevant B12 literature (Section 2). Additionally, the theoretical methods that are used will be briefly described and their reliability assessed through comparison with available experimental data for a test reaction (Section
3).
184
2. BACKGROUND 2.1. Vitamin B12: What Is It?
The study of vitamin B 12 (cyanocobalamin; Ia in Scheme 1)" CONH2 R
CONH2
(Ia) Vitamin B12 ( R = CN )
(Ib) Methylcobalamin ( R = CH 3 ) (Ic) Adenosylcobalamin ( R - Adenosyl ) H-~NOC-------~v \
-
I
II
\
/ "'"~CONH2
H3C..,,,,'[---N\I +/ N---~
(Id) Hydroxocobalamin ( R = OH ) (Ie) Cob(II)alamin ( R = free electron )
H2NOC H"1
N'~' ~ N"--
Adenosyl (5'-deoxyadenosyl)
O~ C"j/ I
N ~ C H 3"
NH H3C.....~ H~O -
CH3oNH.'
.
~' Nt
HO H-..,,
~
H... HOik~-_=
H... -.~mOH
~CH3
..-H
-
...
O ..... I p ~
ff "~ o ~" O
H
,,.... CH2OH
S c h e m e 1. The structure o f the cobalamins.
has provided many challenges for those researching its properties since its discovery, often serving as a benchmark for the advancement of chemical science. For example, the determination of its complex structure by X-ray crystallography [7] was heralded as a major achievement in that field of study. Perhaps even more impressive was the successful chemical synthesis of the vitamin, an enormous effort involving numerous research groups and many years of dedication [8]. The equally mammoth task of elucidating just some of the biosynthetic pathways responsible for the production of B 12, has only recently been completed [9]. By contrasting the pathways operating with and without the presence of oxygen, scientists are uncovering many clues to chemistry as it may have been early in the history of the earth. Cyanocobalamin (Ia)is a relatively inert complex and, apart from being involved in the detoxification of small amounts of hydrogen cyanide [5], does not appear to serve any major biological function [ 10]. The only difference between this species and the metabolically active forms of B 12 (methylcobalamin (MeCbl; Ib) and 5'-deoxyadenosylcobalamin (AdoCbl; Ic)) is the ligand that occupies the
185
upper axial position (R in Scheme 1). The latter two compounds, with a saturated carbon in this position, are sensitive to light and, in the absence of strict precautions, are quickly converted to hydroxocobalamin (HOCbl; Id) in aqueous solution. The hydroxyl group of Id is readily displaced by any cyanide ions that happen to be present, to give the characteristic red crystals of cyanocobalamin (Ia). It was this degradative pathway that is thought to have occurred during the original isolation procedure [2], and explains how the non-biologically-active compound (Ia) came to be known as vitamin B 12. A striking feature of the B 12 complexes (Ia, Ib, and Ic) is the presence of a naturally-occurring cobalt-carbon bond. The rarity of a water-stable example of such a bond led Abeles to predict that "in this bond probably lies the secret of its reactivity" [ 11 ]. Indeed, as we shall see later, there is now substantial support for this insightful prediction.
2.2. Coenzyme B12: What Does It Do? The coenzyme-B 12-dependent reactions (and the associated enzymes) described by equation 1 have been divided into three classes [6]. The first class of coenzyme-B 12-dependent enzymes is represented by the carbon-skeleton mutases (e.g., reactions a - e in Scheme 2), where the migrating group X is part of the carbon framework. Inspection of Scheme 2 shows the carboxylic acid group to be a plausible candidate for migration in each case. However, isotope labeling experiments have established that this is not t]he case, the migrating group being the acrylic acid substituent for 2-methyleneglutarate [12], the thioester group for methylmalonyl-CoA [13], and the glycyl fragment for glutamate [14]. The migration takes place with inversion of stereochemistry for glutamate [15, 16] and 2-methyleneglutarate [16], but with retention of stereochemistry for (R)methylmalonyl-CoA [ 17]. The second class of B12-dependent enzymes is represented by the aminomutases. All reactions in this class involve the 1,2-shift of an amino group. More often than not, the substrates involved in these transformations are the allimportant t~-amino acids. A typical such example is provided by L-leucine-2,3amino mutase (reaction d in Scheme 2). In the last group of reactions, known as the eliminases and reductases (e.g., reactions e - g in Scheme 2), the involvement of a 1,2-migration (equation 1) is not immediately apparent. However, for the diol-dehydratase-catalyzed reactions (e.g., e) [18], there is good evidence that a hydroxyl group does undergo a 1,2shift prior to the elimination of water [19-21]. Similarly, in the case of ethanolamine ammonia lyase (f), it is believed that the amino substituent migrates before ammonia is lost. Ribonucleotide reductase is the only known example of a B12-dependent reaction that does not involve the interchange of a functional group and a hydrogen atom, as shown in reaction g, and it also differs in several other respects [22]. The main focus of the work to be discussed in this chapter is the mechanism by which the B12-dependent carbon-skeleton rearrangements occur. However, before discussing these reactions specifically, we present more information about how coenzyme-B 12 mediates rearrangements in general.
186
HOOC-~
2-methyleneglutaratemutase \
COOH methylmalonyI-CoAmutase
CoAS--~ COOH NH2 HOOC--~ '--x COOH H2N HOOCh-HO N___\OH H2N
(a)
H3C__.~SC~ COOH
(b)
H2N
glutamatemutase
H3C__~--COOH
(c)
NH2 HOOC /k..__
(d)
COOH
L-leucine-2,3-aminomutase
diol dehydratase
H
:
ethanolamineammonialyase
'-'o. HO OH Base/~Opp(p)
H 3 C + COOH COOH
H
. ribonucleotide reductase
H3C--~O + H20 + NH3
OH Base~OPP(P)
(e)
(f)
(g)
Scheme 2. Representative examples of the three classes of coenzyme B 12-dependent reactions. 2.3. The Bound Free-Radical Hypothesis: How Does Coenzyme B12 Work ?
As implied in the introduction, it is generally accepted that the mechanism for Bl2-assisted rearrangements which best accounts for experimental data involves
free radical intermediates [23]. This idea was far-sightedly proposed by Eggerer and co-workers as early as 1960 [ 13], and has subsequently been developed [2426] into what is now a widely accepted mechanism. The first step in this mechanistic proposal is the homolysis of the cobalt-carbon bond of the coenzyme (step a in Scheme 3) to produce cob(II)alamin (Ie) and the 5'-deoxyadenosyl radical (Ado-CH2o). This latter species is thought to abstract a hydrogen atom from a substrate molecule, giving 5'-deoxyadenosine and a substrate-derived radical (Scheme 3, step b). This is transformed into a product-related radical (Scheme 3, step e), which is converted to product by abstraction of a hydrogen atom from the methyl group of 5'-deoxyadenosine (Scheme 3, step d). Re-
187
formation of the cobalt-carbon bond of the coenzyme (step e) completes the catalytic cycle. H Ado... H
Ado-CH2 +
A d o - C H 2 + Substrate-H
Ado-.CH3 + Substrate
Substrate Ado-CH3 +
Product
Ado-CH2 +
9
~ 9
Product
(a)
9
A d o - C H 2 + Product--H H Ado..- H
9 (b) (c) (d)
(e)
Scheme 3. Schematic representation of the bound free-radical hypothesis for B12-catalyzed reactions.
Support for the bound free-radical hypothesis (i.e., Scheme 3) comes from a variety of sources and has been summarized in several comprehensive reviews [5, 6, 26-30]. Early evidence was obtained from electron spin resonance (ESR) [31 ] and ultraviolet (UV) [32] spectroscopic results. Additionally, labeling experiments provided evidence of hydrogen transfer between the substrate of a variety of B jz-dependent enzymes and the coenzyme [33, 34]. Although objections have been raised to the hydrogen-transfer step [29] on the basis that the abstractions are proposed to take place at relatively unactivated positions, calculations of the thermodynamics of the hydrogen-transfer steps support this mechanistic proposal [21, 35]. Evidence supporting Co-C bond homolysis has been obtained from the X-ray crystal structures of several B12-binding proteins [36-39], where cobalt is found in a form consistent with Co zI (i.e., pentacoordinated). An abundance of electron spin resonance (ESR) studies have also providecl support for the homolysis of the cobalt-carbon bond [31, 40-43]. ESR spectra of the holoenzymes can be analyzed in terms of two dissimilar, strongly interacting, spins [41]. One spin is almost certainly cob(II)alamin [42], while the,, other is most likely a carboncentered radical [41 ]. One of the most important ESR contributions has come from a recent investigation of glutamate mutase from Clostridium cochlearium [43], where a signal was determined to arise from interactions between cob(II)alamin and an organic radical approximately 6.6 + 0.9 * apart, a distance in striking agreement with recent crystallographic data [39]. The most abundant organic radical was concluded to be the 4-glutamyl substrate radical, an assignment supported by recent theoretical calculations of the relevant ESR parameters [35]. From the
188
similarity between the ESR spectrum of glutamate mutase and the spectra of other B12-dependent carbon-skeleton mutases, it is tempting to speculate that the substrate-related radicals are also likely to be the major radicals in the ESR spectra of the other enzymes. The Co-C bond homolysis and the hydrogen-transfer step are not fully understood. However, both are widely accepted in the literature. Thus, we too will accept the occurrence of both steps, leading to a stage in the mechanism for rearrangement where cob(II)alamin, 5'-deoxyadenosine and a substrate-derived radical are all bound to the protein. This state must progress to the formation of the product and release of 5'-deoxyadenosylcobalamin. Exactly how this transformation (step e in Scheme 3) occurs is undoubtedly the least well understood step in the bound free-radical hypothesis. However, with the abundance of ESR data available for B12-dependent reactions, we can be confident that these reactions do occur through pathways involving free radicals and we turn our attention to the radical rearrangement mechanism.
2.4. The Radical Rearrangement Mechanism Although radical pathways directly involving cobalt have been postulated [19, 44], there does not appear to be significant evidence in this direction. The probable non-involvement of cobalt in the rearrangement has led some authors to propose that the role of coenzyme-B 12 in biological systems is to act merely as a 'reversible free-radical' carrier to provide a reservoir from which 5'deoxyadenosyl radicals can be released [5, 26, 28]. The key step in the transformation of substrate to product in B12-dependent reactions is then the rearrangement of the substrate-derived radical to product-related radical (Scheme 3, step e) [6, 25, 26], which involves a 1,2-shift in a free radical. Not a great deal is known about these kinds of reactions except that they are relatively rare [45] and, in some cases, they are associated with a high activation energy [45]. Thus, a brief outline of possible mechanisms is useful. We have considered four mechanistic possibilities (Scheme 4) for the migration of a functional group X (bonded to carbon) to an adjacent carbon atom beating an unpaired electron. Mechanism a depicts a transient fragmentation to form an alkene and the radical Xo. Re-addition of Xo to the adjacent ethylenic carbon atom yields the rearranged target. This mechanism, referred to as fragmentationrecombination, was originally suggested as a possibility many years ago [23], and discussed again slightly later [46]. It is generally thought to be energetically more demanding than the alternative intramolecular pathways (see below). However, recent experiments [47], which found that mimics for the fragmented intermediate states are able to inhibit the action of two carbon-skeleton mutases, have caused it to be seriously reconsidered in the B l2 field. More specifically, fragmentation-recombination mechanisms of this kind have been suggested to provide a possible unifying hypothesis for the B lz-dependent carbon-skeleton mutases [47]. Mechanisms b - d all involve intramolecular migration of the group X without detachment from the two-carbon unit. The first possibility of this type (reaction b) will be referred to as the concerted mechanism and has the symmetrical bridged species as a transition structure. Such rearrangements are usually
189
associated with large energy barriers [45, 48], often requiting one-electron occupancy of a high-energy orbital [23]. Exceptions are observed with migrating groups involving low-lying d orbitals [49] or containing re-electron systems [50].
Xo
x
;,c .....
11.
x
,,,c c,,,
~ - ~ " ~
x .,,~C-C,,~
/'~ll,,.
x
(b)
9 ' ,,4C-C,~.
,,,x ] ;~c-c~,.j
.;\ ,,.]" x r. . . . . i ) o . . .,.,
(a)
;~c- ',.,
',~176
x.
/\ ",ll
- 4C
__
~ll~
C,~,
~ - . -
.... , , ,,o _
x
c~:
(c)
,o.
+~ +HX .s
7c-c;;j
C.,~
XH + 4C.,-C .....
(d)
S c h e m e 4. Possible mechanisms for the 1,2-migration of a group X in a free radical.
An unsaturated migrating group allows for a slightly different mechanism to the one shown as mechanism b. In this case,, the migrating group X contains a pair of doubly-bonded atoms (A=B, with A bonded to the framework carbon, see Scheme 5). This allows an intramolecular addition of the unpaired electron to atom A of the rc system to form a stable intermediate in which the free electron resides on atom B. The three-membered ring thus formed may then eliminate, by homolytic fission of the appropriate adjacent bond, to give the rearranged product. This pathway (mechanism c) is referred to as the addition-elimination mechanism and has received a moderate amount of support in the B 12 field [5, 26]. The distinction between the concerted mechanism (b) and the additionelimination mechanism (c) depends formally on whether the cyclic species is a transition structure or a stable intermediate. 'INs distinction is not always clearcut, as there are cases in which the depth of the well containing the intermediate becomes very small, making it difficult to classify the rearrangement one way or
190
the other. There do exist, however, definitive examples of both classes of behavior (see later). An alternative possibility to the formation of a three-membered ring, via reaction e, is the addition of the unpaired electron to atom B of the migrating group, resulting in the formation of a four-membered ring with the free electron residing on atom A (Scheme 5). Ring opening can lead to an isomeric radical but not that corresponding to migration of the A=B (X) fragment. B
I
A
B
A"
/X
.__5
J
/_. 9 A-B
I_J
~
A/gH2
--'g
Scheme 5. Possible ring-closing/ring-opening modes in the case of an unsaturated migrating group X (Scheme 4, reaction c, X equals A--B).
A further (and final in the context of the current discussion) variation to the intramolecular mechanism arises from protonation of the migrating group (mechanism d). Previous studies have shown that protonation is able to facilitate the 1,2-shift in cases where it may otherwise seem unfavorable [23, 51, 52]. This has been explained as being due to an increase in the cationic nature of the shift [23] or, alternatively, by obviating the need for one-electron occupancy of a high energy orbital [52]. None of the above pathways have been well characterized in the literature prior to this work. The theoretical studies to be discussed in this article examine these rearrangement mechanisms in detail, with the aim of determining which pathway is energetically most feasible for carbon-skeleton rearrangements. The B12dependent reactions typically proceed with a kcat between 40 and 150 s -1 at 30 ~ [34, 53, 54]. Using techniques initially suggested by George et al. [52] and utilized by our group [21], we find that the barrier for the rate-limiting step in B12-catalyzed 1,2-shifts should lie between approximately 50 and 75 kJ mo1-1. The barrier for the radical rearrangement step must therefore fall within or below this range. This provides an approximate reference point to help in assessing the proposed radical rearrangement mechanisms. Implicit in our approach is, of course, acceptance of the bound free-radical hypothesis. As we have seen in the previous sections, this appears to be the most likely mode of action for coenzyme B 12- The widespread acceptance [ 1] of this pathway demonstrates that it has, so far, been able to stand the test of time. Our attention is now turned to theoretical techniques that can provide accurate estimates of thermochemical data. 3. EVALUATION OF T H E O R E T I C A L TECHNIQUES The theoretical calculations used to study the various mechanistic possibilities are based on ab initio molecular orbital theory. Previous theoretical studies of
191
radical reactions have indicated that the results are sensitive to the level of theory used [55, 56]. Therefore, we present in this section an assessment of the performance of a variety of available techniques. The interconversion of the cyclopropylcarbinyl radical with isomeric but-3-enyl radicals (equation 2) is used to evaluate the theoretical methods:
[~
.
~-
"-,,,,,.,,/~
(2)
We chose this reaction since it has been proposed as a model for the rearrangement of 2-methyleneglutarate to 3-methylitaconate, catalyzed by the coenzyme-Blz-dependent enzyme, 2-methyleneglutarate mutase [16, 26, 57]. More specifically, equation 2 represents the second step in the additionelimination pathway (reaction e, Scheme 4) for a 1,2-shift. Additionally, this reaction has been widely studied experimentally [58] and has been described as 'the most precisely calibrated radical reaction' [59]. To directly assess the performance of different theoretical techniques, we have chosen to present the barriers (AH*) and reaction enthalpies (AH) at 0 K and to exclude the zero-point vibrational energy. In order to make a meaningful comparison with experiment, the experimental activation energy and enthalpy of reaction must be back-corrected accordingly. We have accomplished these corrections by using zero-point vibrational energies and temperature corrections calculated at the B3-LYP/6-31G(d) level. This leads to an experimental vibrationless AH* at 0 K of 31.2 kJ mo1-1 [60] and, depending on which experimental value is used [59-61], vibrationless enthalpies of reaction at 0 K (AH) of-6.9 kJ mo1-1, -8.0 kJ mo1-1, -11.6 kJ mo1-1 or -19.4 kJ mo1-1 are obtained [62]. The energies determined for the ring opening of the cyclopropylcarbinyl radical with a variety of higher-level procedures are displayed in Table 1 [62]. The energies are found to be relatively insensitive to choice of geometry [62]. Unless otherwise noted, the energies in Table 1 were obtained with B3-LYP/6-31G(d) geometries, a choice based in part on comparisons with higher-level results (QCISD). The initial entries in Table 1 correspond to the MP2 level of theory with unrestricted (UMP), projected (PMP) and restricted (RMP) approaches in combination with the 6-31 l+G(d,p) and 6-31 l+G(3df,2p) basis sets. UMP2 performs quite poorly, overestimating the barrier by some 35 kJ mo1-1. Projecting out the first spin contaminant (PMP2) remedies the situation somewhat, lowering the activation energy by approximately 30 kJ mol -l. The RMP2 results, which correspond to pure doublet states, also constitute a significant improvement to the UMP2 barriers. These results suggest that the large degree of spin contamination in the transition structure is the major contributor to the poor UMP2 result. The MP2 enthalpies deviate by around 10 kJ mo1-1 from the experimental estimates, predicting the ring-opening to be essentially thermoneutral. The B3-LYP barriers obtained with three different basis sets are in impressive agreement with experiment, particularly considering the relatively inexpensive
192
nature of such calculations. Additionally, the B 3 - L Y P reaction enthalpies are quite close to those obtained with higher-level theoretical methods. These results are very encouraging for the chemist with limited computational resources or with potential application to larger, related systems in mind. T a b l e 1. Calculated Barriers (AH~) and Reaction Enthalpies (AH) (kJ mo1-1) for the Ring Opening of the Cyclopropylcarbinyl Radical a AH:~ UMP2/6-311 +G(d,p) UMP2/6-31 l+G(3df,2p) PMP2/6-311 +G(d,p) PMP2/6-311 +G(3df,2p) RMP2/6-311 +G(d,p) RMP2/6-311 +G(3df,2p) B3-LYP/6-31G(d) B 3-LYP/6-311 +G(d,p) B 3-LYP/6-311 +G(3df,2p) CBS-RAD CBS-Q G2 G2(MP2) G2M(RCC) b G2(MP2)-RAD G2(MP2,SVP)-RAD G3(MP2)-RAD(p) c Experimental d
66.5 66.6 36.2 36.3 43.0 41.4 35.0 30.3 30.7 32.9 32.1 37.5 38.2 41.3 35.5 32.7 31.6 31.2 e
AH -0.2 1.0 -0.1 1.3 -0.7 0.8 -9.5 - 15.6 - 14.1 -9.1 -9.1 -11.3 - 11.2 -10.7 -10.1 - 11.3 - 14.3 -6.9 f, -8.0g, - 11.6h, - 19.4i
a Unless otherwise specified, all values have been obtained using B3-LYP/6-31G(d) geometries, calculated without zero-point vibrational energies and taken from reference [62]. b Calculated at B3-LYP/6-311G(d,p) geometries, c Present work. d Experimental results have been corrected to 0 K and the zero-point energy contribution has been removed. See reference [62]. e From references [60] and [62]. f From references [60], [61a] and [62]. g From references [61a,b] and [62]. h From experimental heats of formation of reactant and product. See reference [62]. i From reference [59]. Single-point calculations were also performed with high levels of theory based on the complete basis set (CBS) [63] and Gaussian-n [64] techniques (Table 1). In general, all of these methods perform reasonably well and the energetics are in satisfactory agreement with one another. CBS-Q and CBS-RAD, which are based on QCISD(T) and CCSD(T) high-level calculations, respectively, yield very similar results. The G2 methodology (i.e., G2 and G2(MP2)) produces a relatively large change in the reaction barrier, and a smaller change in the reaction enthalpy, in a direction away from the experimental value (with respect to the CBS techniques). The G2M(RCC) technique [65] yields slightly worse results than those obtained with standard G2 or CBS techniques. A small improvement over the conventional G2(MP2) method can be obtained with
193
G2(MP2)-RAD, which implements restricted-open-shell versions of both coupled cluster and perturbation theories. The latter method has been shown to generally yield reliable results when applied to open-shell systems [66]. Among the methods presented in Table 1, CBS-RAD, G2(MP2,SVP)-RAD and G3(MP2)-RAD(p) all yield barriers (AH*) consistent with the experimental value, as well as reliable results for the reaction enthalpy. The RAD notation in these methods signifies modifications to standard procedures designed to improve their performance in describing the thermochemistry of radicals [66]. The modifications include replacement of the high-level QCISD(T) calculation in the standard CBS, G2 and G3 techniques by CCSD(T). In addition, the RAD methods employed here use either B3-LYP/6-31G(d) (RAD) or B3-LYP/631G(d,p) (RAD(p)) geometries and zero-point corrections. The polarization functions on the hydrogen atoms in RAD(p) procedures are included to provide a better description of species involved in hydrogen bonding (see later). For species without hydrogen bonding [62, 67],. the inclusion of these light-atom polarization functions has a minimal effect. G2(MP2,SVP)-RAD reduces the computational demands of the G2(MP2)-RAD method by use of the 6-31G(d) basis set for the higher-level calculation. Thes;e modifications to the conventional CBS and Gaussian-n methods have been shown to produce improved predictions of thermochemical data for radicals [56, 66]. The energetics for the model reactions of the three B12-dependent carbonskeleton rearrangements will therefore be discussed in terms of energies obtained with the CBS-RAD, G2(MP2,SVP)-RAD and G3(MP2)-RAD(p) techniques on B3-LYP geometries, optimized either with or without polarization functions on hydrogen atoms. Since we find that these methods yield results in good agreement with one another and with experiment, we only present G3(MP2)RAD(p) data in the text and figures, unless otherwise noted, but results for the other methods are included in the tables for comparison. The thermochemistry for the rearrangements catalyzed by the three carbon-skeleton mutases will now be discussed. 4. 2-METHYLENEGLUTARATE MUTASE The first B12-dependent enzyme to be considered is 2-methyleneglutarate mutase which catalyzes the interconversion of 2-methyleneglutarate with (R)-3methylitaconate [67]"
---( H
CO2-
co~H
a-O2C
(3) H
H
This transformation is part of a microbial metabolic pathway in which nicotinate is broken down into ammonia, CO2, acetate and pyruvate [16, 68]. Accepting the bound free-radical hypothesis implies that the crucial radical rearrangement step can be represented by reaction 4, in which a 2-methyleneglutarate-derived radical (1) is transformed into an (R)-3-methylitaconate-related radical (2) [6]:
194
-02C
(4) 1
2
There have been two major suggestions for the rearrangement in reaction 4"
-020 +
H ~ H
.002 ....
H~~H
-020 `
co -
H. ~ . C O e -
H"
"H
Scheme 6. Two possible pathways for the radical rearrangement catalyzed by 2methyleneglutarate mutase.
The first is the fragmentation-recombination pathway (see mechanism a, Scheme 4), with acrylate and an acrylate-derived radical as the intermediate state (Scheme 6). This possibility has been suggested only recently, and is based on the reported inhibition of 2-methyleneglutarate mutase by acrylate [47]. The second suggested mechanism is the addition-elimination pathway (see reaction c, Scheme 4), with a substituted cyclopropylcarbinyl radical as the intermediate (Scheme 6) [6, 26]. Ab initio molecular orbital theory is ideally suited to compare and contrast the suggested pathways. However, the system shown in Scheme 6 is relatively large and conformationally flexible, making its detailed theoretical investigation computationally demanding though not impossible. In the present chapter, we use a 'model system' approach whereby substituents such as carboxylate groups are replaced by computationally less expensive and simpler hydrogen atoms [67]. We have tested the merits of this approach by performing additional calculations on larger systems that account for the carboxylate groups [35, 67, 69]. Although it is beyond the scope of the present work to discuss these results, it is important to mention that the effect of the carboxylate groups is small in most cases, but their inclusion can sometimes be important. Nevertheless, useful information can be obtained by comparing the results obtained on these small model systems. One advantage of this approach is that the smaller radicals may be treated with more accurate theoretical techniques than is possible for species involving the carboxylate groups.
195
Applying the suggested simplifications to the 2-methyleneglutarate-mutasecatalyzed rearrangement results in the system shown in reaction 5" H
H H.,, I'-I
i,'H H
3
(5)
H
3'
Thus, we have effectively replaced the transformation of 2-methyleneglutarate to (R)-3-methylitaconate with the degenerate rearrangement of the but-3-enyl radical [67]. This approach greatly simplifies the calculations, while keeping the rearranging carbon skeleton intact. The possible modes for the degenerate rearrangement of the but-3-enyl radical are summarized in Scheme 7 and the relative energies of the species involved are displayed in Table 2 and Figure 1. a
II
,'----]'* i i - - -
TS:3-->4
+IH
4
TS:3~5
3
5
a
TS:4-->3'
TS:5~3'
+
3' -Ill
c
3-H +
:1: --.-i
CH--~3+:1:
TS:3-H +~3'-H*
+
c
3"H +
Scheme 7. Possible mechanisms for the degenerate rearrangement of the but-3-enyl radical (3).
4.1. Fragmentation-Recombination The fragmentation-recombination mechanism (step a, Scheme 7) for the rearrangement of the but-3-enyl radical (3) proceeds via a bond fission to give the vinyl radical plus ethylene (collectively re,ferred to as 4) followed by an intermolecular radical addition to form the rearranged product (3'). The energy of TS:3---~4 is found to be quite high, nearly 150 kJ tool -1 above the but-3-enyl radical (3), demonstrating that the fragmentation step is energetically unfavorable. We find that the energy increases relatively steeply as the two fragments separate, rising to more than 50 kJ mo1-1 above that of 3 at a separation of just 1.8 A. This finding may be relevant in considerations of the reaction within the confines of the cavity of the active site of the enzyme. The two fragments (4) lie in a
196
relatively shallow energy well ( 1 0 - 20 kJ tool -1 deep), indicating that if fragmentation were to be effected, then the recombination could occur relatively easily. Indeed, recent experiments [70] have given an approximate activation energy for this process of 30 kJ mo1-1, only slightly higher than our calculated values.
Table 2. Relative Energies (kJ mol-1)a of the Species Involved in the Degenerate Rearrangement of the But-3-enyl Radical (3, Scheme 7) at 0 K CBS-RADb G3(MP2)-RAD("p) 3 TS:3--~4 4 TS:3--->5 5 3-H+ TS:3-H+--->3'-H+
0.0 147.5 137.2 42.4 12.4 0.0 8.8
0.0 150.9 134.6 46.3 17.6 0.0 8.2
a Energies relative to either 3 or 3-H+. See text. b Reference [67].
4.2. Addition-Elimination The presence of a C=C double bond in the migrating group of the but-3-enyl radical introduces the possibility of the addition-elimination mechanism (path b, Scheme 7), where the appropriate intermediate is the previously discussed cyclopropylcarbinyl radical (5). We find a significant preference (ca 100 kJ tool -1) for the addition-elimination pathway compared with the fragmentationrecombination pathway. Thus it is more favorable, in the gas phase at least, for the migrating HC=CH2 group to stay bonded to the remaining framework rather than to become detached from it. The cyclopropylcarbinyl radical intermediate involved in the addition-elimination mechanism is predicted to lie in a well of depth 30 kJ mo1-1.
4.3. Facilitation by Protonation Guided by a previous study [23, 71], we were encouraged to investigate the facilitation of the concerted 1,2-shift in the but-3-enyl radical by protonation of the migrating group (step e, Scheme 7). Of the two possible protonation sites on the migrating group, we have chosen the terminal carbon for our current investigation (3-H+). The resulting reaction is equivalent to the degenerate rearrangement of a partially ring-opened methyl cyclopropane radical cation. The unsubstituted cyclopropane radical cation has received considerable experimental [72] and theoretical [73] attention and is thought to exist as three equivalent 2A 1 partially-ring-opened structures. These three equivalent structures are able to interconvert relatively easily, via three equivalent 2B2 structures. Although the symmetry is reduced in the methyl-substituted system, we are able to observe the appropriate 1,2-shift operating by a mechanism analogous to that
197
of the unsubstituted case. The barrier to interconversion for the two methylcyclopropane radical cations (less than 10 kJ mo1-1, Table 2) is found to be significantly lower than the barrier for the unassisted addition-elimination. The energetics for the three pathways discussed for the degenerate rearrangement of the but-3-enyl radical are; summarized in Figure 1. The potential energy diagram illustrates the reduced energy requirement upon moving from the fragmentation-recombination pathway to the addition-elimination mechanism. The benefits of substrate protonation are also clearly evident [67]. 200 Relative Energy
(kJ mol'l)
150
TS:3--->4
TS:4-->3'
/ (150.9) ~(134.6) /" (150.9) ....... I
100
I
~" m " m -
50
t
4
I I I I I I
t
t t t t t t
I
i TS:3---~5
--"
I / (46.3) \ (17 6)
:
9
"
_ 9
TS:5----3' ,,, ,~ (46.3)
It
""
- . . . . . i~'~ 3. H+__,3,."6~-. . . . . . . 3, 3 - H +
t
(8.2)
Z'~ 3', 3 ' - H +
-50 Figure 1. Schematic G3(MP2)-RAD(p) energy profile for the degenerate rearrangement of the but-3-enyl radical (see Scheme 7). Relative energies (kJ mo1-1) in parentheses.
5. METHYLMALONYL-CoA MUTASE The second B12-dependent enzyme to be discussed is methylmalonyl-CoA mutase which catalyzes the transformation of (R)-methylmalonyl-CoA to succinyl-CoA [69]"
CoAS--4'0 . H
H
-O C O_scoA H
H
(6)
This step is the culmination of a reaction sequence in which propionyl-CoA, a toxic metabolite derived from the degradation of fats, is removed from circulation. Carboxylation of propionyl-CoA gives (S)-methylmalonyl-CoA, which is epimerized to (R)-methylmalonyl-CoA. Conversion of the (R)-isomer to succinyl-CoA allows further metabolism via the Krebs cycle [74]. Methylmalonyl-CoA mutase is also the only adenosylcobalamin-dependent enzyme known to participate in human metabolism, and as such has received significant study [30, 37, 38].
198
Acceptance of the bound free-radical hypothesis in this instance results in the radical rearrangement shown in reaction 7 [47, 75]"
coAs_/,/~
O --SCoA H
" 6
(7)
7
As with the 2-methyleneglutarate mutase system, the detailed computational investigation of the methylmalonyl-CoA mutase system is somewhat complex. We therefore continue to use the 'model system' approach, and replace the SCoA and carboxylate groups by hydrogen atoms. This simplification results in the degenerate rearrangement of the 3-propanal radical (8) [69, 76]: o
o H
H
.,i
8
(8)
i
8'
We have investigated three distinct mechanistic possibilities (see Scheme 8) for the rearrangement shown in reaction 8 [69]. The relative energies are displayed in Table 3 and Figure 2.
a
p_O--]o :[: o i
TS:8-->9
0
0 ~
b
8
II 9
~
9
0__~-- ] . :[: ,
TS:9-->8'
o--J. :J:
b
.5 8'
TS:8-~8'
+ IH +
-IH c
8-H §
a
i i
oH---It:I: TS:8-H §
+
+
c
8'-H +
Scheme 8. Possible mechanisms for the degenerate rearrangement of the 3-propanal radical (8).
199
Table 3. RelativeEnergies (kJ mol-1)a of the Species Involved in the Degenerate Rearrangement of the 3-Propanal Radical (8, Scheme 8) at 0 K
CBS-RAD(p)b G.2(MP2,SVP)-RAD(p)b G3(MP2)-RAD(p) 8
TS:8--->9 9 TS:8--->8' 8-H+
TS :8-H+---->8'-H+
0.0
0.0
0.0
93.2 66.9 46.9
96.1 63.6 51.8
95.2 63.3 53.0 0.0 13.9
0.0
0.0
10.0
12.7
a Energies relative to either 8 or 8-H+. See text. b Reference [69]. 5.1. F r a g m e n t a t i o n - R e c o m b i n a t i o n
The first pathway (a, Scheme 8) for the rearrangement of the 3-propanal radical involves a homolytic bond fission in 8 to give the formyl radical plus ethylene (collectively referred to as 9) followed by an intermolecular radical addition to form the rearranged product 8'. As was the case in the degenerate rearrangement of the but-3-enyl radical, we find the fragmentation-recombination pathway for reaction 8 to be associated with a relatively high barrier (95.2 kJ tool -1, see Table 3). The separated fragments (9) are found to lie 63.3 kJ mo1-1 above the reactant (8). 5.2. Addition-Elimination
The second possible pathway (route b, Scheme 8) involves an intramolecular migration of the formyl group in what is commonly thought of as a two-step process. The first step involves an intramolecular radical addition to the carbonyl carbon to form an intermediate cyclopropyloxy radical (shown in Scheme 8 as TS:8---~8'). The three-membered ring can then undergo a ring-opening elimination reaction to give the desired product [77] (the addition-elimination mechanism). We find that the cyclopropyloxy radical lies in a very shallow well (with a depth of 0.3 kJ mo1-1) on the electronic potential energy surface, which disappears upon the inclusion of zero-point vibrational energy. We therefore conclude that the cyclopropyloxy radical does not correspond to a stable intermediate and that the addition-elimination pathway is essentially a single-step process (as shown in mechanism b, Scheme 4). The barrier for this intramolecular rearrangement (53.0 kJ tool -1, see Table 3) [78] is considerably lower than that calculated for the fragmentation-recombination pathway. 5.3. Facilitation By Protonation
Encouraged by the results of previous calculations which showed the beneficial effects of protonation in facilitating 1,2-shifts in free radicals ([20, 23, 67, 71 ] and Section 4.3), and following the specific suggestion of protonating the migrating carbonyl group [23, 71], we investigated the rearrangement of the protonated 3propanal radical (8-H+). The resulting 1,2-shift of the CHOH group (Scheme 8,
200
pathway e) is found to proceed, via a single transition structure, with an extremely low barrier (13.9 kJ mo1-1) [79]. We believe that this result is particularly important in understanding how methylmalonyl-CoA mutase catalyzes the interconversion of the substrate-derived and product-related radicals [69]. The energetics of the three pathways in Scheme 8 are compared in Figure 2 for the model of the methylmalonyl-CoA-mutase-catalyzed reaction. As for the model discussed for the rearrangement of 2-methyleneglutarate, the fragmentation-recombination mechanism is associated with the largest barrier. The barrier height is decreased on moving to the addition-elimination mechanism and further lowered upon protonation. Relative
100
9 Energy - (kJ moi'l)
TS:8---9
TS:9--~8'
'9
80 ,
i
60
I I I
/
' l I
[ / I SS (0.0) 9 l,l d/ ~ ~
j
(63.3)/
9
9
~
I
I
I I |
TS:8~8'
,
I
,'
40
|
"(95.2)I \
I
20
w
/(95.2)\
i |
(53.0) \,,
/
, N
i ",_ ~%
$8
I
I
TS:8- *'+ . . . . + ',,, ' ~ _ %% m_ ..... (13.9) . . . . . . . . . ~ \,~ (0.0)
8, 8-H +
8', 8'-H +
Figure 2. Schematic G3(MP2)-RAD(p) energy profile for the degenerate rearrangement of the 3-propanal radical (see Scheme 8). Relative energies (kJ mo1-1) in parentheses.
6. GLUTAMATE MUTASE The final B12-dependent carbon-skeleton rearrangement to be discussed is catalyzed by glutamate mutase, which involves the interconversion of (S)glutamate and (2S,3S)-3-methylaspartate:
M
(9) H
CO 2-
This reaction represents the first step in the fermentation of glutamate to acetate and butyrate in many clostridia [4, 80]. Once again, accepting the bound freeradical hypothesis leads to the following radical rearrangement:
201
-O2C~NH2
H'",~~O2H 10
H2N?O~
" H:"~'~H CO211
(10)
The reaction catalyzed by glutamate mutase differs from those catalyzed by other carbon-skeleton mutases because of the saturation of the migrating group. Thus, the possibility of a bridged intermediate, which provides a lower energy pathway for the other carbon-skeleton rearrangements, is more difficult to conceptualize. However, since so many experimental similarities have been observed between the enzymes, such as the nature of the reactions catalyzed, the composition of the enzymes, the cofactors required and ESR data points [6, 16], it is desirable to look for a mechanistic link between the radical rearrangements catalyzed by B12-dependent carbon-skeleton mutases. Comparison of possible radical pathways for this reaction with those previously considered for other carbon-skeleton rearrangements will yield insight into whether it is likely that all the carbon-skeleton rearrangements occur through similar pathways, or whether nature has different, equally efficient, ways to deal with related reactions. As discussed for the previous two rearrangements, the carboxylate groups in 10 and 11 were replaced with hydrogen atoms and the computational problem reduces to investigating the rearrangement of the radical derived from propylamine [35]:
H4NH2 12
H2N~H H:~--~,. H
(11)
12'
Once again a number of different pathways fi3r the degenerate rearrangement of the aminopropyl radical can be considered (Scheme 9) [35], including pathways that are analogous to those examined as models for the reactions catalyzed by methyleneglutarate mutase (Scheme 7) and methylmalonyl-CoA mutase (Scheme 8).
6.1. Fragmentation-Recombination Pathway for the Rearrangement of the Aminopropyl Radical Proposals for a "unified mechanism" for all B12-assisted carbon-skeleton rearrangements have focused on the fragmentation-recombination pathway [16, 47]. This proposal is attractive since the formation of separated products for all the reactions can be easily envisioned. The fragmentation-recombination pathway for the rearrangement of the aminopropyl radical (path a, Scheme 9) initially produces ethylene plus the aminomethyl radical (collectively referred to as 13). This step is associated with a high barrier (97.2 kJ mo1-1, Table 4). We note that the separated fragments lie in a shallow energy well (36.2 kJ tool-l), indicating
202
that recombination of the separated products is a favorable process if fragmentation occurs. The prediction of a high barrier for this rearrangement pathway is consistent with our calculations regarding fragmentationrecombination for the model systems used to study the rearrangement of the 2methyleneglutarate and (R)-methylmalonyl-CoA substrate-derived radicals.
J
,._NH2-'[. :l:
a
NH2 I
,
H2N---,---I ~ :l:
i ,_...
i
TS:12~13
NH2
13
TS:13-~12' H2N
._2
12
12'
,~=NH-I :1: ~
/
14
"-
b/f
i i
~
NH2
14-H +
HN------~7":1:
9
, i i
.~
b TS: 14-->15
--I ~ NH
TS:I 4-->16
16
d
TS:15-->14' " ~
15
HN ~1.:1:
L
I
NH II
c
:J
Nh-1~ :1:
:5 14' l
H-
TS:I 6-->14'
NH--[~ +:1:
TS: 14-H*~ 14'-H*
d
H2N
.3 14'-H*
Scheme 9: Possible mechanisms for the degenerate rearrangements of the aminopropyl (12) and iminopropyl (14) radicals.
6.2. Rearrangement of the Iminopropyl Radical Due to the high energy associated with the fragmentation-recombination mechanism in the (S)-glutamate model system, it is attractive to consider alternative rearrangement pathways. It has been proposed that interactions between a group within the enzyme and the amino group of (S)-glutamate may lead to the formation of an imine and thereby facilitate the rearrangement of the substrate by permitting a cyclic intermediate [81, 82]. There are precedents for the transformation of amines to imines in other enzyme systems [6, 83]. Despite the fact that experimental evidence for the presence of such groups in glutamate mutase remains to be found [39, 53, 81, 84], it is still of interest to investigate the energetics of this reaction pathway and to determine whether it provides a lower
203
energy route. Thus, three mechanistic pathways will be considered for the rearrangement of the iminopropyl radical (equation 12 and Scheme 9 b, e and d). H
.NH
HN.
H
(12) 14
14'
Relative energies for the species involved in this reaction pathway are included in Table 4 and Figure 3. Table 4. Relative Energies (kJ mol-1)a for the Species Involved in the Rearrangement of the Aminopropyl (12) and Iminopropyl (14) Radicals (Scheme 9) at 0 K G3(MP2)-RAD(p) 12 TS: 12---~13 13 14 TS: 14--->15 15 TS: 14---)16 16
0.0 97.2 61.1 0.0 118.0 90.2 52.4 37.6
14-H + TS: 14-H+---~14'-H + a
0.0 19.0
Energies relative to 12, 14 or 14-H+. See text.
Once again, the fragmentation-recombination pathway (path b, Scheme 9) is associated with a very high energy transition structure (118.0 kJ mo1-1) with respect to the reactant radical (14). Even the separated fragments (15) lie 90.2 kJ mo1-1 above the reactant. A low barrier (27.8 kJ mo1-1) between the separated fragments and product radical indicates t]hat product formation from the separated fragments will occur readily provided fragmentation is achieved. Clearly, this pathway is less favorable than the fragmentation-recombination of the aminopropyl radical. More specifically, not only is the barrier height increased by 20.8 kJ mo1-1, but the pathway is also more complicated since an imine must be formed in a preliminary step. The second possible route for the rearrangement of iminopropyl radical involves the formation of a cyclic intermediate (16), and the subsequent elimination of the amino carbon to yield the product radical (path e, Scheme 9). The barrier for this addition-elimination pathway (52.4 kJ mo1-1) is significantly lower than the barrier for the fragmentation-recombination of the iminopropyl radical (118.0 kJ mo1-1) or the aminopropyl radical (97.2 kJ mol-1). Additionally,
204
the cyclic intermediate is only 37.6 kJ mo1-1 higher in energy than the reactant radical and, if formed, is separated by only a small barrier (14.8 kJ mo1-1) from the product radical. Therefore, in the gas-phase, an intramolecular rearrangement is favored over one involving bond fragmentation for the rearrangement of the iminopropyl radical. The third possibility for radical rearrangement (path d, Scheme 9) involves protonation of the iminopropyl radical. In contrast to the addition-elimination mechanism for the neutral system, the protonated cyclic structure (TS:14H+---~14'-H +) is found to be a transition structure, rather than a stable intermediate. This transition structure lies 19.0 kJ mo1-1 above the reactant radical (14-H+). Protonation of the reactant radical thus leads to a significant reduction in the barrier height (by 33.4 kJ mo1-1). Relative
100
"Energy --(kJ mo1-1)
TS:14--,15 (118.0) TS:15-->14' (118.0) i \ / i 1 'i ~ \ ~,, ( 9 0 . 2 ) / / 1
"
60
TS:12-~13 ~ 9 7 2~ " " "
II
i
(52.4) -"
20
15
I
iJ~ ~,m~
9
40
I
,,
80
-9
--
J 9 ~i
t
,
~ i! ~
\(61.1)/ e 13
~._ "~.
~l~
s I
TS:13-~12' (97.2)
(52.4) tt~ /
_,,'
9
I
tt I1
[[~ "~,, ( 3 7 . 6 ) / ~~ /iTS:14--~16 " " TS:16--~14 AI ttti i/ 16 I1~ ii
_.-
-._ ~
I ,,,'"" (19.0) ..... (0.0) L,,.-'"" TS:14"H+-~14"H+ 12, 14, 14-H +
~
',~l
'~ """""1
(0.0)
12', 14', 14'-H +
Figure 3. Schematic G3(MP2)-RAD(p) energy profile for the degenerate rearrangements of the aminopropyl (12) and iminopropyl (14) radicals (see Scheme 9). Relative energies (kJ tool- ]) in parentheses. 6.3. Hydride Ion Removal from the Aminopropyl Radical The pathway involving cyclization of a protonated migrating group provides a very appealing alternative to the fragmentation-recombination pathway. Given the lack of evidence for imine formation, it is interesting to note that the formation of a protonated imine (14-H +) in the model system can alternatively arise formally as the result of removal of a hydride ion from the parent (saturated) system (12), aminopropyl (path e, Scheme 9). To examine the feasibility of cation formation through hydride ion abstraction, we obtained an estimate for the barrier for hydride ion removal. Due to complications associated with the gas-phase reaction, we modeled the abstraction by considering the 1-aminoethyl cation abstracting a hydride ion from the neutral aminopropyl radical. Although hydride abstraction might have been expected to be a high-energy process, the calculated barrier for this model reaction is only 13
205
kJ mo1-1, a value small enough to have only a minimal effect on the enzymatic turnover rate. The pathways discussed for the rearrangements of the aminopropyl and iminopropyl radicals are compared in Figure 3. Once again, the fragmentationrecombination mechanism offers a high-energy route. The benefit of hydride ion removal from the parent system, leading to a protonated imine, is also clearly apparent. As discussed in the following section, trends between the model systems begin to emerge. 7. COMPARISON OF THE MODELS FOR B12-DEPENDENT CARBONSKELETON MUTASES
To obtain an overview of the reactions catalyzed by B12-dependent carbonskeleton mutases, we present a comparison of the G3(MP2)-RAD(p) barrier heights for the different pathways considered Jin the present work as a function of the migrating group (Figure 4). ii
Relative Energy (kJ mo1-1)
(150.9) 150
-
- .....
r g t u '%a-men'a"on-recomk'na':on u,u (118.0)
(95.2)
100
(97.2)
Addition-elimination
50
(46.3)
(52.4)
(53.0)
Protonation (19.0)
(8.2) CH=CH 2
(13.9) ,
CH=NH
CH=O
CH2-NH 2
Figure 4. Comparison of the G3(MP2)-RAD(p) energy requirements for the fragmentationrecombination, addition-elimination and protonated pathways for the model systems of B12dependent carbon-skeleton mutases with migrating groups CH=CH2, CH=NH, CH=O and CH2-NH2.
Some important trends are apparent in Figure 4. In the first place, the fragmentation-recombination barrier heights for the model systems have consistently high values of between approximately 95 and 150 kJ mo1-1. The fragmentation-recombination barrier height depends on the migrating group, with barriers decreasing in the order CH=CH2 > CH=NH > CH2NH2 > CH=O. This
206
trend presumably reflects differences in the stability of the radical fragment in the high-energy route. Since the barrier heights for B12-assisted 1,2-shifts are estimated from the reaction rates to fall within or below a range of 50 to 75 kJ tool -1 (see Section 2.4), for the fragmentation-recombination mechanism to be plausible, the enzyme would be required to substantially reduce the activation barrier for this route. How the enzyme could perform such a feat is not immediately apparent. Although the calculations on the model systems cannot be used to rule out the fragmentation-recombination pathway, the high barrier implies that an alternative mechanism may be important. The intramolecular addition-elimination mechanism, which is possible when the migrating group is unsaturated, provides a lower energy pathway than fragmentation-recombination. Whether or not the three-membered cyclic structure associated with this pathway is a transition structure (3-propanal radical rearrangement) or a stable intermediate (but-3-enyl and iminopropyl radical rearrangements) does not affect this general conclusion. The barriers for the addition-elimination route lie between 46 and 53 kJ mo1-1, significantly less than those calculated for the fragmentation-recombination pathway. A pathway involving a cyclic intermediate could not be characterized for the CH2-NH2 migrating group, possibly due to the high energy expected for such a structure. The addition-elimination barriers fall within the range estimated for B12dependent 1,2-shifts, but the pathway becomes energetically still more favorable when the migrating group is protonated. In fact, the barrier heights for the protonated pathways of all the model systems with unsaturated migrating groups fall below 20 kJ mo1-1. This protonation alternative is very appealing and could provide a clue as to how these demanding reactions are catalyzed by the enzymes. The applicability of this finding to enzyme catalysis is considered in the following section. 8. THE PARTIAL-PROTON-TRANSFER CONCEPT
The results for the models of carbon-skeleton rearrangements suggest that protonation of the migrating group would facilitate the reactions. However, the concept of substrate protonation, while energetically attractive, carries with it the problem that it is difficult to achieve substantial protonation of a weak base with the weakly acidic groups available to enzymes [85]. For example, the pKa of the conjugate acid of a thioester carbonyl oxygen is estimated to be around -6 [86], so even the strongest conceivable acid in a protein cannot be expected to generate a substantial concentration of protonated substrate in the reaction catalyzed by methylmalonyl-CoA mutase (reaction 6). Owing to the problems associated with mechanisms involving full protonation, we have considered whether partial-proton-transfer would be sufficient to activate the migrating group [87]. To investigate such behavior, we examined the interaction of the 3-propanal radical with a set of representative acids (NH4 +, HF and H30 +) with the CBS-RAD(p) method [69]. This choice encompasses a wide range of acid strengths, as measured by the proton affinities (PAs) of the conjugate bases (F-= 1556.0, NH3 --- 848.6, and H20 = 680.1 kJ tool-l) [88]. The geometries of the substrate-acid complexes suggest partial-proton-transfer through changes in the C=O and acidic proton ( X ~ H ) bond lengths. The
207
distance between the acidic proton and the carbonyl oxygen of the 3-propanal radical in the relevant complexes is the most direct measure of the degree of proton transfer to oxygen. We find that this distance decreases across the acid series from infinity (no protonation), to 1.727 A (HF), 1.503/k (NH4+), 1.046 (H30+), and 0.976 A (full protonation). A similar trend (but in the opposite direction) is found for the C=O bond length, with HF causing a slight lengthening to 1.221 A, NH4 § to 1.235, while H30 + imparts the largest effect with a calculated carbonyl bond length of 1.273 A. This same monotonic trend can be found for several of the other geometric parameters. The most striking consequence of the transition from non-protonation to complete protonation of the carbonyl group in the 3-propanal radical is the monotonic lowering of the barrier to migration of the formyl group (see Figure 5) [69, 89]. 50 40
30
_ Relative Energy
(kJ mo1-1)
(46.9) -~ ! E
"9
HF
NH 4+
/~// ~ (41.4)
~l ~ (24.5) t .E
g~
20 10
no protonation
I I
-\\,, II I1 II II
o
I
t / ,~s~, : , . ~ 1 t full protonation //.,,,f" (10.0) ~ ' ~ % ,, Reactant
Product
Figure 5. Schematic CBS-RAD(p) energy profiles showing barriers (kJ tool -1) for the rearrangement of the 3-propanal radical (8) assisted by acids of varying strength.
As might have been expected from the barriers in the extreme cases (Figure 4 and Tables 2 - 4), a greater degree of proton transfer is associated with a lower barrier to rearrangement. The acidity of H30 + is sufficient to result in a barrier (10.3 kJ mo1-1) virtually identical to that calculated for full protonation, while the barrier with HF as the acid (41.4 kJ mo1-1) shows that even a small amount of proton transfer can result in a significant decrease in the barrier for migration. With the ammonium ion, the moderately high proton affinity of ammonia maintains the relatively strong binding of the proton while allowing sufficient proton transfer to facilitate the rearrangement, to the extent that the barrier is reduced to 24.5 kJ mo1-1. In the context of enzymatic catalysis, this situation might be regarded as ideal since significant barrier lowering can be achieved without deprotonation of the enzyme. It is possible to phrase the partial-proton-transfer concept in the same language of hydrogen bonding that has been employed in the current debate as to whether
208
or not "low-barrier" hydrogen bonds (LBHBs) or "short strong" hydrogen bonds (SSHBs) can be important in enzymatic catalysis [90]. The discussion has focussed on concepts such as the pKa matching of the H-bonding donor and acceptor [91 ], the positioning of the shared proton [92], the distance between the donor and acceptor atoms [93, 94], the strength of the hydrogen bond [94, 95] and the non-existence of "short-strong" H-bonds under certain solvation conditions [96, 97]. We believe that our results and their interpretation make an important contribution to this debate and that it is conceptually instructive to examine the "low-barrier" hydrogen-bonding hypothesis in terms of partialproton-transfers. The lowering of a reaction barrier by protonation is equivalent to saying that the transition structure interacts more favorably with the proton than does the reactant. For example, at the CBS-RAD(p) level, the energy of the transition structure (TS:8---~8')is lowered by 825.1 kJ tool -1 upon protonation while the 3propanal radical (8) has a proton affinity of 788.2 kJ mo1-1. The difference between these two energies of 36.9 kJ mo1-1 is exactly the reduction in barrier associated with protonation. The same concept applies to partial protonation. We note in the first place that the gas-phase hydrogen bond between the 3-propanal radical and NH4 + is quite strong (96.9 kJ mol-1), despite the fact that the proton transfer between donor and acceptor is described by a single, asymmetric energy well. However, the 22.3 kJ mo1-1 lowering of the rearrangement barrier (corresponding to a rate increase of ca five orders of magnitude) by NH4 + comes not from the strength of this hydrogen bond but from the fact that the interaction between NH4 + and the transition structure (119.2 kJ mol-1) is 22.3 kJ mo1-1 stronger than its interaction with the reactant, due to the higher 'proton affinity' of the former species. This reasoning is supported by the geometric parameters, in that the degree of proton transfer to the transition structure is greater than it is to the reactant. In an enzymatic reaction facilitated by protonation, the proton-accepting site will generally carry some small amount of negative charge, making it a good candidate for binding to a proton donor in the protein via a hydrogen bond. If such a hydrogen bond exists and remains intact during the course of the reaction then, regardless of the strength of the H-bond donor, the barrier will be lowered simply because the transition structure interacts more strongly with the proton than does the substrate. The transition from a "weak" hydrogen bond to a "short-strong" hydrogen bond is continuous and, regardless of where a particular H-bonding interaction happens to lie on this scale, there will be a contribution to the lowering of the barrier made by partial-proton-transfer. Our thesis is simple: any reaction that is facilitated by protonation will also be facilitated (to a moderated extent) by the partial-proton-transfer that enzymatic hydrogen bonding can provide. It is unlikely that a given partial-proton-transfer would be overly efficient in aqueous solution. In much the same way as has been argued in the context of the LBHB hypothesis [97], the hydrogen bonding donor/acceptor properties of water and the entropic disorder associated with such a solution are likely to disrupt the hydrogen bond. However, the active sites of many enzymes are sequestered from bulk water, at least to some extent, and may therefore provide
209
an environment well suited to hydrogen bonding relatively undisrupted by solvent. In particular, the active site of methylmalonyl-CoA mutase has been shown to be deeply buried and largely inaccessible to solvent [37], seemingly providing such a tailored environment. Furthermore, the X-ray crystal structures [37, 38] indicate that an active site histidine residue (His244) is in a position to bind the carbonyl oxygen of the substrate by means of a hydrogen bond. We suggest that this hydrogen bond not only serves to bind the substrate but also provides partial-proton-transfer for catalysis. In this way, the enzyme can take advantage of the proton-induced barrier-lowering that is available for the intramolecular rearrangement, without resorting to the extreme of full protonation. Although we have only discussed the partial-proton-transfer model for one of the B12-dependent carbon-skeleton mutases [69], we can expect that there also exists a continuum between no protonation and full protonation of the substrate in the other reactions. Analogously, partial hydride removal from the substrate of glutamate mutase may serve to facilitate this rearrangement. 9. CONCLUSIONS The main focus of the current chapter is to gain a greater understanding about the radical rearrangement step in reactions catalyzed by B 12-dependent carbonskeleton mutases. Through a 'model system' approach, estimates of the barrier heights associated with a variety of radical rearrangement pathways were obtained from high-level molecular orbital calculations. General trends through a series of model B12-assisted carbon-skeleton rearrangements are apparent. Foremost, a recently suggested mechanism involving complete detachment of the migrating group from the rest of the molecule (i.e., fragmentation-recombination) is found to require significantly more energy than an intramolecular pathway (i.e., addition-elimination). In addition, protonation of the substrate reduces the addition-elimination barrier height, thus identifying a way for the enzyme to facilitate these otherwise energetically demanding reactions. Although full protonation may not be feasible within the enzymatic environment, our calculations show that partial-proton-transfer from the enzyme to the substrate can provide a significant reduction in the energy requirement for the rearrangement. Evidence that this mechanism is plausible in the case of the reaction catalyzed by methylmalonyl-CoA mutase is provided by the X-ray crystal structure. Although the effects of the carboxylate groups, and other substituents, were not discussed in this article, this represents an important area of ongoing research. Accounting for these groups can provide additional information about the enzyme-catalyzed reactions, such as the exothermicity and stereochemistry. Preliminary results indicate that, although there are differences in the details between models that neglect and account for the carboxylate groups, in most cases the small models provide an adequate description of the gas-phase rearrangements. In some instances, the magnitude of the barrier reduction between different pathways is diminished, but nevertheless it is still at hand. Despite the abundance of literature on coenzyme-B12 and the reactions it catalyzes, the field remains open. We hope that the present article provides useful
210
insights into the substrate chemistry in B12-dependent carbon-skeleton rearrangements and allows informed speculation about the function of the related enzymes.
ACKNOWLEDGEMENTS We thank Professor Bernard Golding for his insightful contributions to our general program of theoretical studies of enzyme-catalyzed reactions, and thank the Australian National University Supercomputing Facility for generous allocations of computer resources.
REFERENCES 1. 2. 3. 4. 5. 6. 7. 8.
9.
10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20.
B. Krautler, D. Arigoni and B. T. Golding, Vitamin B12 and B12-Proteins; Wiley-VCH: Weinheim, 1998. (a) E. L. Rickes, N. G. Brink, F. R. Koniuszy, T. R. Wood and K. Folkers, Science, 107 (1948) 396. (b) E. L. Smith and L. F. Parker, Biochem. J., 43 (1948) 7. R. West, Science, 107 (1948) 398. H.A. Barker, H. Weissbach and R. D. Smyth, Proc. Natl. Acad. Sci. USA, 44 (1958) 1093. B.T. Golding, Chem. Br., 26 (1990) 950. B.T. Golding and W. Buckel, Comprehensive Biological Catalysis, M. L. Sinnott (ed.), Academic Press, London, 1997, Vol. III, pp 239. (a) D. C. Hodgkin, J. Kamper, M. Mackay, J. Pickworth, K. N. Trueblood and J. G. White, Nature, 178 (1956) 64. (b) P. G. Lenhart and D. C. Hodgkin, Nature, 161 (1961) 937. (a) K. Bemhauer, O. Muller and F. Wagner, Angew. Chem. Int. Ed. Engl., 3 (1964) 200. (b) A. Eschenmoser, Chem. Soc. Rev., 5 (1976) 377. (c) A. Eschenmoser, R. Scheffold, E. Bertele, M. Pesaro and H. Gschwend, Proc. Roy. Soc., 288 (1965) 306. (d) A. W. Johnson, Chem. Soc. Rev., 4 (1975) 1. (e) D. C. Black, V. M. Clark, B. G. Odell and L. Todd, J. Chem. Soc., Perkin Trans. I, (1976) 1944. (f) R. V. Stevens, Tetrahedron Lett., 32 (1976) 1599. (g) R. B. Woodward, Pure Appl. Chem., 33 (1973) 145. (h) A. Eschenmoser, Science, 196 (1977) 513. (a) A. I. Scott, Tetrahedron Lett., 50 (1994) 13313. (b) F. Blanche, B. Cameron, J. Crouzet, L. Debussche, D. Thibaut, M. Vuilhorgne, F. J. Leeper and A. R. Battersby, Angew. Chem. Int. Ed. Engl., 34 (1995) 384. (c) A. R. Battersby and F. J. Leeper, Chem. Rev., 90 (1990) 1261. (d) P. M. Shoolingin-Jordan, J. Bioener. Biomembr., 27 (1995) 181. (e) P. Renz, B. Endres, B. Kurz and J. Marquat, Eur. J. Biochem., 217 (1993) 1117. L. Ellenbogen and B. A. Cooper, Handbook of Vitamins, L. J. Machlin (ed.), Marcel Dekker, New York & Basel, 1991, pp 491. R. H. Abeles, Proceedings Robert A Welch Foundation, Conf. Chem. Res., Vol XV, BioOrganic Chemistry and Mechanisms, W. O. Milligan (ed.), 1972, pp 95. (a) P. Dowd and R. Hershline, J. Chem. Soc. Perkin. Trans. 2, (1988) 61. (b) H. Kung and T. C. Stadtman, J. Biol. Chem., 246 (1971) 3378. (c) G. Hartrampf and W. Buckel, Eur. J. Biochem., 156 (1986) 301. H. Eggerer, E. R. Stadtman, P. Overath and F. Lynen, Biochem. Z., 333 (1960) 1. H. A. Barker, V. Roose, F. Sizuki and A. A. Iodice, J. Biol. Chem., 242 (1967) 878. A. Munch-Peterson and H. A. Barker, J. Biol. Chem., 230 (1958) 649. W. Buckel and B. T. Golding, Chem. Soc. Rev., 26 (1996) 329. (a) M. Michenfelder, W. E. Hull and J. R6tey, Eur. J. Biochem., 168 (1987) 659. (b) G. C. Hall, Proc. Roy. Soc. (London), A205 (1951) 541. The enzyme-coenzyme partnership involving diol dehydratase catalyzes the dehydration of both ethane- 1,2-diol and propane- 1,2-diol. J. R6tey, A. Umani-Ronchi, J. Seibl and D. Arigoni, Experientia, 22 (1966) 502. D. M. Smith, B. T. Golding and L. Radom, J. Am. Chem. Soc., 121 (1999) 5700.
211
21. 22. 23. 24. 25. 26. 27.
28. 29. 30. 31. 32. 33.
34. 35. 36. 37. 38. 39. 40.
41. 42. 43.
D. Smith, B. Golding and L. Radom, submitted for publication. J. Stubbe, Biol. Chem., 265 (1990) 5330. B.T. Golding and L. Radom, J. Am. Chem. Soc., 98 (1976) 6331. (a) B. M. Babior, Acc. Chem. Res., 2498 (1975) 376. (b) J. R6tey, Recent Adv. Phytochem., 13 (1979) 1. (c) B. T. Golding, B12, D. Dolphin (led.), J Wiley & Sons, New York, 1982, Vol. 1, pp 543. R. G. Finke, D. A. Schiraldi and B. J. Mayer, Coord. Chem. Rev., 54 (1984) 1. J. Halpern, Science, 227 (1985) 869. (a) J. R6tey, Angew. Chem. Int. Ed. Eng., 29 (1990) 355. (b) B. Zagalak and W. Friedrich, Vitamin B 12. Proceedings of the Third European Symposium on Vitamin B 12 and Intrinsic Factor; Walter de Gruyter: New York, 1979. (c) D. Dolphin, B12; Wiley-Interscience, New York, 1982, Vol. 1 and 2. (d) R. H. Abeles and D. Dolphin, Acc. Chem. Res., 9 (1976) 114. B. M. Babior, Biofactors, 1 (1988) 21. (e) B. Krautler, Cobalt, Blz-Enzymes and Coenzymes, Encyclopedia of Inorganic Chemistry, John Wiley & Sons, Chichester, England, 1994, Vol. 2. R. G. Finke, Molecular Mechanisms in B ioorganic Processes, C. B leasdale and B. T. Golding (eds.), The Royal Society of Chemistry, Cambridge, 1990, pp 281. P. Dowd, Selective Hydrocarbon Activation, J. A. Davies, P. L. Watson, J. F. Liebman and A. Greenberg (eds.), VCH, New York, 1990, pp 26:5. M. L. Ludwig and R. G. Matthews, Ann. Rev. Biochem., 66 (1997) 269. (a) W. H. Orme-Johnson, H. Beinert and R. L. Blakley, J. Biol. Chem., 249 (1974) 2338. (b) S. A. Cockle, H. A. O. Hill, R. J. P. Williams, S. P. Davies and M. A. Foster, J. Am. Chem. Soc., 94 (1972) 275. K.N. Joblin, A. W. Johnson, M. F. Lappert, M. R. Hollaway and H. A. White, FEBS Lett., 53 (1975) 193. (a) H. F. Kung and L. Tsai, J. Biol. Chem., 246 (1971) 6436. (b) R. H. Abeles and B. Zagalak, J. Biol. Chem., 241 (1966) 1245. (c) P. A.. Frey and R. H. Abeles, J. Biol. Chem., 241 (1966) 2732. (d) P. A. Frey, M. K. Essenberg and R. H. Abeles, J. Biol. Chem., 242 (1967) 5369. (e) J. R6tey and D. Arigoni, Experientia, 24 (1966) 783. (f) R. L. Switzer, B. G. Baltimore and H. A. Barker, J. Biol. Chem., 244 (1969) 5263. (g) B. M. Babior, Biochem. Biophys. Acta., 167 (1968) 456. (h) J. R6tey, F. Kunz, T. C. Stadtman and D. Arigoni, Experientia, 25 (1968) 802. T. W. Meier, N. H. Thoma and P. F. Leadlay, Bioc,hemistry, 35 (1996) 11791. S. D. Wetmore, D. M. Smith and L. Radom, work in progress. (a) C. Luschinsky-Drennan, S. Huang, J. T. Drummond, R. G. Matthews and M. L. Ludwig, Science, 266 (1994) 1669. (b) N. Shibata, J. Masuda, T. Tobimatsu, T. Toraya, K. Suto, Y. Morimoto and N. Yasuoka, Structure, 7 (1999) 997. F. Mancia, N. H. Keep, A. Nakagawa, P. F. Leadlay, S. McSweeney, B. Rasmussen, P. Bosecke, O. Diat and P. R. Evans, Structure, 4 (1996) 339. (a) F. Mancia and P. R. Evans, Structure, 6 (1998) 711. (b) N. H. Thoma, T. W. Meier, P. R. Evans and P. F. Leadlay, Biochemistry, 37 (1998) 14386. R. Reitzer, K. Gruber, G. Jogl, U. G. Wagner, H. Bothe, W. Buckel and C. Kratky, Structure, 7 (1999) 891. (a) Y. Zhao, P. Such and J. R6tey, Angew. Chem. Int. Ed. Engl., 31 (1992) 215. (b) M. G. N. Hartmanis and T. C. Stadtman, Proc. Natl. Acad. Sci., 84 (1987) 76. (c) O. Zelder, B. Beatrix, U. Leutbecher and W. Buckel, Eur. J. Biochem., 226 (1994) 577. (d) B. M. Babior, T. H. Moss, W. H. Orme-Johnson and H. Beinert, J. Biol. Chem., 249 (1974) 4537. (e) O. Zelder and W. Buckel, Biol. Chem. Hoppe-Seyler, 374 (1993) 84. (f) Y. Zhao, A. Abend, M. Kunz, P. Such and J. R6tey, Eur. J. Biochem., 225 (1994) 891. R. Padmakumar and R. Banerjee, J. Biol. Chem., 270 (1995) 9295. R. Padmakumar, S. Taoka, R. Padmakumar and R. Banerjee, J. Am. Chem. Soc., 117 (1995) 7033. H. Bothe, D. J. Darley, S. P. J. Albracht, G. J. Gerfen, B. T. Golding and W. Buckel, Biochemistry, 37 (1998) 4105.
212
44. (a) M. He and P. Dowd, J. Am. Chem. Soc., 120 (1998) 1133. (b) R. B. Silverman and D. Dolphin, J. Am. Chem. Soc., 98 (1976) 4626. (c) W. A. Mulac and D. Meyerstein, J. Al'n. Chem. Soc., 104 (1982) 4124. (d) R. G. Finke, W. P. McKenna, D. A. Schiraldi, B. L. Smith and C. Pierpont, J. Am. Chem. Soc., 105 (1983) 7592. 45. A. Greenberg and J. F. Liebman, Energetics of Organic Free Radicals, J. A. M. Simoes, A. Greenberg and J. F. Liebman (eds.), Blackie Academic and Professional, London, 1996, Vol. 4, pp 224. 46. J.J. Russell, H. S. Rzepa and D. A. Widdowson, J. Chem. Soc., Chem. Commun. (1983) 625. 47. B. Beatrix, O. Zelder, F. K. Kroll, G. Orlygsson, B. T. Golding and W. Buckel, Angew. Chem. Int. Ed. Engl., 34 (1995) 2398. 48. J. Pacansky, R. J. Waltman and L. A. Barnes, J. Phys. Chem., 97 (1993) 10694. 49. T. Hoz, M. Sprecher and H. Basch, J. Phys. Chem., 89 (1985) 1664. 50. D. A. Lindsay, J. Lusztyk and K. U. Ingold, J. Am. Chem. Soc., 106 (1984) 7087. 51. T. Hoz, M. Sprecher and H. Basch, J. Mol. Struct. Theochem, 150 (1987) 51. 52. P. George, J. P. Glusker and C. W. Bock, J. Am. Chem. Soc., 119 (1997) 7065. 53. D. E. Holloway and E. N. G. Marsh, J. Biol. Chem., 269 (1994) 20425. 54. (a) W. W. Bachovchin, R. G. E. Jr., K. W. Moore and J. H. Richards, Biochemistry, 16 (1977) 1082. (b) B. M. Babior, B 12, D. Dolphin (ed.), John Wiley & Sons, New York, Vol. 2, pp 263. 55. (a) M. W. Wong and L. Radom, J. Phys. Chem., 99 (1995) 8582. (b) P. M. Mayer, C. J. Parkinson, D. M. Smith and L. Radom, J. Chem. Phys., 108 (1998) 604. (c) P. J. Knowles, S. J. Andrews, R. D. Amos, N. C. Handy and J. A. Pople, Chem. Phys. Lett., 186 (1991) 130. (d) H. B. Schlegel, J. Chem. Phys., 84 (1986) 4530. (e) J. F. Stanton, J. Chem. Phys., 101 (1994) 371. 56. M. W. Wong and L. Radom, J. Phys. Chem., 102 (1998) 2237. 57. S. Wollowitz andJ. Halpern, J. Am. Chem. Soc., 110 (1988) 3112. 58. M. Newcombe, Tetrahedron, 49 (1993) 1151. 59. F. N. Martinez, H. B. Schlegel and M. Newcombe, J. Org. Chem., 61 (1996) 8547. 60. M. Newcombe and A. G. Glenn, J. Am. Chem. Soc., 111 (1989) 275. 61. (a) A. Effio, D. Griller, K. U. Ingold, A. L. J. Beckwith and A. K. Serelis, J. Am. Chem. Soc., 102 (1980) 1734. (b) B. Maillard, D. Forrest and K. U. Ingold, J. Am. Chem. Soc., 98 (1976) 7024. (c) D. F. McMillen, D. M. Golden and S. W. Benson, Int. J. Chem. Kinet., 3 (1971) 358. (d) J. D. Cox and G. Pilcher, Thermochemistry of Organic and Organometallic Compounds, Academic Press, New York, 1970. (e) W. Tsang, J. Am. Chem. Soc., 107 (1985) 2872. 62. D. M. Smith, A. Nicolaides, B. T. Golding and L. Radom, J. Am. Chem. Soc., 120 (1998) 10223. 63. J. A. Montgomery, Jr., M. J. Frisch, J. W. Ochterski and G. A. Petersson, J. Chem. Phys., 110 (1999) 2822, and references therein. 64. L. A. Curtiss, K. Raghavachari, P. C. Redfern, V. Rassolov and J. A. Pople, J. Chem. Phys., 109 (1998) 7764, and references therein. 65. A. M. Mebel, K. Morokuma and M. C. Lin, J. Chem. Phys., 103 (1995) 7414. 66. (a) C. J. Parkinson, P. M. Mayer and L. Radom, Theor. Chem. Acc., 102 (1999) 92. (b) C. J. Parkinson and L. Radom, work in progress. 67. D. M. Smith, B. T. Golding and L. Radom, J. Am. Chem. Soc., 121 (1999) 1037. 68. H. F. Kung, S. Cederbaum, L. Tsai and T. C. Stadtman, Proc. Natl. Acad. Sci. USA, 65 (1970) 978. 69. D. M. Smith, B. T. Golding and L. Radom, J. Am. Chem. Soc., 121 (1999) 9388. 70. J. M. Roscoe, I. S. Jayaweera, A. L. Mackenzie and P. D. Pacey, Int. J. Chem. Kinet., 28 (1996) 181. 71. B. T. Golding and L. Radom, J. Chem. Soc. Chem. Commun. (1973) 939. 72. (a) X. Z. Qin and F. Williams, Tetrahedron Lett., 42 (1986) 6301. (b) S. Lunell, I. Yin and M. B. Huang, Chem. Phys., 139 (1989) 293. (c) L. W. Sieck, R. Gordon and P. Ausloos, J. Am. Chem. Soc., 94 (1972) 7157.
213
73. (a) K. Krogh-Jespersen and H. D. Roth, J. Am. Chem. Soc., 114 (1992) 8388. (b) A. Skancke, J. Phys. Chem., 99 (1995) 13886. (c) P. Du, D. A. Hrovat and W. T. Borden, J. Am. Chem. Soc., 110(1988)3405. 74. J. R6tey, B 12, D. Dolphin (ed.), John Wiley & Sons, New York, 1982, Vol. 2, pp 357. 75. J. R6tey and J. A. Robinson, Stereospecificity in Organic Chemistry and Enzymology, H. F. Ebel (ed.), Verlag Chemie, Weinheim, 1982, pp 185. 76. For simplicity, we refer to structure 8 (~ as the 3-propanal radical. This species may be better named as the 3-oxoprop-1-yl radical. 77. B. Giese and H. Horler, Tetrahedron Lett., 24 (1983) 3221. 78. The quoted barrier corresponds to the energy of the symmetrical structure which, after inclusion of the zero-point energy, is higher than the two non-symmetrical transition structures. 79. This same shift has been investigated previously in the context of mass spectrometry experiments using lower-level molecular orbital calculations and mass spectrometry: G. Bouchoux, A. Luna and J. Tortajada, Int. J. Mass Spectrom. Ion Proc., 167 (1997) 353. 80. (a) H. A. Barker, R. D. Smyth and R. M. Wilson, Ref. Proc., 17 (1958) 185. (b) H. A. Barker, R. D. Smyth, E. J. Wawszkiewicz, M. N. Lee and R. M. Wilson, Arch. Biochem. Biophys., 78 (1958) 468. (c) H. A. Barker, R. D. Smyth, E. J. Wawszkiewicz, A. MunchPeterson, J. I. Toohey, J. N. Ladd, B. E. Volcani and R. M. Wilson, J. Biol. Chem., 235 (1960) 181. (d) W. Buckel and H. A. Barker, J. Bacteriol., 117 (1974) 1248. (e) W. Buckel, Arch. Microbiol., 127 (1980) 167. 81. M. Brecht, J. Kellermann and A. Pltichthun, FEBS Lett., 319 (1993) 84. 82. (a) P. Dowd, S. Choi, F. Duah and C. Kaufman, Tetrahedron Lett., 44 (1988) 2137. (b) S. Choi and P. Dowd, J. Am. Chem. Soc., 111 (1989) 2313. 83. J. Baker and T. Stadtman, B12, D. Dolphin (ed.), John Wiley and Sons, New York, 1982 Vol. 2, pp 203. 84. (a) U. Leutbecher, R. B6cher, D. Linder and W. Buckel, Eur. J. Biochem., 205 (1992) 759. (b) F. Suzuki and H. A. Barker, J. Biol. Chem., 241 (1965) 878. 85. A. Thibblin and W. P. Jencks, J. Am. Chem. Soc., 101 (1979) 4963. 86. J. T. Edward, S. C. Wong and G. Welch, Can. J. Chem., 59 (1978) 931. 87. D. M. Smith, B. T. Golding and L. Radom, J. Am. Chem. Soc., 121 (1999) 1383. 88. These species, although not physiologically significant themselves, were chosen to demonstrate how the migration behavior depends on the strength of the interacting acid. On this basis, the amino acids His--H + and LysmH + could be expected to show behavior similar to NH4 +, while Asp and Glu should be closer to H3 O+, and Cys and Tyr closer to HF. 89. The rearrangement assisted by HF has the same electronic profile as the uncatalyzed pathway. That is, the symmetrical structure corresponds to a minimum on the vibrationless potential energy surface that disappears upon inclusion of zero-point energy. With either NH4 + or H3 O+ as the acid, the symmetrical species is a transition structure on the vibrationless surface. 90. (a) J. A. Gerlt and P. G. Gassman, J. Am. Chem. Soc., 115 (1993) 11552. (b) W. W. Cleland and M. M. Kreevoy, Science, 264 (1994) 1887. (c) P. A. Frey, S. A. Whitt and J. B. Tobin, Science, 264 (1994) 1927. 91. (a) S. Scheiner and T. Kar, J. Am. Chem. Soc., 117 (1995) 6970. (b) S. Shan, S. Lob and D. Herschlag, Science, 272 (1996) 97. (c) M. Garcia-Viloca, A. Gonzalez-Lafont and J. M. Lluch, J. Phys. Chem. A., 101 (1997) 3880. 92. (a) E. L. Ash, J. L. Sudmeier, E. C. DeFabo and W. W. Bachovchin, Science, 278 (1997) 1128. (b) M. E. Tuckerman, D. Marx, M. L. Klein and M. Parrinello, Science, 275 (1997) 817. (c) C. L. Perrin and J. B. Nielson, Annu. Rev. Phys. Chem., 48 (1997) 511. (d) C. L. Perrin, J. B. Nielson and Y. Kim, Ber. Bunsenges. Phys. Chem., 102 (1998) 403. 93. P. Gilli, V. Bertolasi, V. Ferretti and G. Gilli, J. Am. Chem. Soc., 116 (1994) 909. 94. J. P. Guthrie, Chem. Biol., 3 (1996) 163. 95. (a) Y. Pan and M. McAllister, J. Am. Chem. Soc., 120 (1998) 166. (b) Y. Pan and M. McAllister, J. Am. Chem. Soc., 119 (1997) 7561.
214
96. A. Warshel, A. Papazyan and P. A. Kollman, Science, 269 (1995) 102. 97. A. Warshel and A. Papazyan, Proc. Natl. Acad. Sci. USA, 93 (1996) 13665.
L.A. Eriksson (Editor) Theoretical Biochemistry- Processes and Properties of Biological Systems
215
Theoretical and ComputationalChemistry,Vol. 9 9 2001 ElsevierScienceB.V. All rights reserved
Chapter 6 S I M U L A T I O N S OF E N Z Y M A T I C S Y S T E M S PERSPECTIVES FROM CAR-PARRINELLO MOLECULAR DYNAMICS SIMULATIONS
P a o l o C a r l o n i 1,2 a n d U r s u l a R o t h l i s b e r g e r 3 1 International School of Advanced Studies and INFM-Istituto Nazionale di Fisica della Materia, 1-34014 Trieste, Italy 2 International Centre for Genetic Engineering and Biotechnology, 1-34012 Trieste, Italy SLaboratory of Inorganic Chemistry, ETH Zurich, CH-8092 Zurich, Switzerland
1
INTRODUCTION
In 1985, Car and ParrineUo introduced a new method [1] that merged two major fields of computational chemistry that had so far been essentially orthogonal. They were able to combine electronic structure calculations based on density functional theory (DFT) with a classical molecular dynamics (MD) scheme. This new simulation method allows to perform parameter-free MD studies in which all the interactions are calculated on the fly via an electronic structure method. These first-principles or ab initio molecular dynamics (AIMD) simulations are especially valuable for systems for which it is difficult (or impossible) to construct empirical potential energy functions. They are also a necessary prerequisite for the study of processes in which a wide range of chemical environments are sampled that challenge the transferability properties of empirically-derived potentials. An adequate description of (transition) metal centers or the forming and breaking of bonds in chemical reactions can be mentioned as typical examples. The introduction of the Car-Parrinello method has not only extended the range of classical MD simulations based on empirical potentials but at the same time, it has also significantly increased the capabilities of conventional electronic structure calculations. Through the combination with a MD method a generalization to finite temperature and condensed phase systems was achieved. Furthermore, a whole set of simulation tools based on statistical mechanics can be applied in this way in the context of an electronic structure method. Consequently, many dynamic as well as thermodynamic properties can be described within the accuracy of a first-principles method. AIMD was first applied to clusters [2, 3] and amorphous solids [4]; subsequently it became also a valuable and versatile tool for the study of materials [5] and of chemical reactions in the gas phase [6] and on surfaces [7].
216
A first step towards biological applications was issued in the early 90's when Parrinello, Car and co-workers demonstrated the power of the method in describing structural, electronic and dynamical properties of liquid water and other H-bonded systems [8]. Indeed, not only did this work extend the domain of applications to solute/solvent interactions [9] and chemical reactions in aqueous solution [10], but it also provided a first basis for biological modeling and therefore represented a key step for the start of ab initio biosimulations. The first application on a biological system was performed in the mid-90's on a gas phase cluster model of the active site of superoxide dismutase [11]. Since then a rapidly increasing number of applications to biological systems have been reported. In this article, we are trying to give an overview of the current status by giving a short outline of the studies that appeared so far in the literature and by presenting selected examples from our own work on enzymatic systems. This review is organized as follows. In Section 2, we describe the foundations of the method in its most wide-spread implementation, the one based on DFT, plane wave basis sets and pseudopotentials. In Section 3, we outline the different approaches to AIMD modeling of biological systems. This is followed by a summary of the applications that appeared so far (Section 4), with particular emphasis on enzymes (Section 5). Finally, in Section 6, we give an outlook on possible future directions for the investigation of enzymes and other fundamental classes of biomolecules.
2
PRINCIPLES METHOD
OF T H E C A R - P A R R I N E L L O
The central concept of AIMD as introduced by Car and Parrinello [1] lies in the idea to treat the electronic degrees of freedom, as described by e.g. one-electron wavefunctions r as dynamical classical variables. The mixed system of nuclei and electrons is then described in terms of the extended classical Lagrangian/2~:
s
-
K.N + 1E,~-
Epot
(1)
where/EN is the kinetic energy of the nuclei,/E~ is the analogous term for the electronic degrees of freedom and Epot is the potential energy which is a function of both, nuclear positions R1 and electronic variables r s can thus be written as: L:r
i
1 / 2 M , ~ ; + y~#[r i
2 - E[{r
{/~,}] + Y~ Ai, [ { f r162
} -5i,]
(2)
i,s
where Aij are Lagrange multipliers that ensure orthonormality of the one-electron wavefunctions r and # is a fictitious mass associated with the electronic degrees of freedom. The Lagrangian in Eq. 2 determines the time evolution of a fictitious classical system in which nucleic positions as well as electronic degrees of freedom are treated as dynamic variables. The classical equations of motion (EOM) of this system are given by the Euler-Lagrange equations:
217
d--t
-
Oq*
(3)
where qi corresponds to a set of generalized coordinates. With the Lagrangian of Eq. 2, the EOM for the nuclear degrees of freedom become: :~ MIRI =
0t7 _.
(4)
0RI and for the electronic ones - -nv,
+
A,jCj (5) J where the term with the Lagrange multipliers Aij describes the constraint forces that are needed to keep the wavefunctions orthonormal during the dynamics. The parameter # is a purely fictitious variable and can be assigned an arbitrary value. In full analogy to the nuclear degrees of freedom, # determines the rate at which the electronic variables evolve in time. In particular, the ratio of M1 to # characterizes the relative speed in which the electronic variables propagate with respect to the nuclear positions. For # < < M1 the electronic degrees of freedom adjust instantaneously to changes in the nuclear coordinates and the resulting dynamics is adiabatic. Under this conditions K:e < < ~N and the extended Lagrangian in Eq.1 becomes identical to the physical Lagrangian of the system. s - ]~N -- E p o t (6)
For finite values of p, the system moves within a given thickness of E kin above the BornOppenheimer surface. Adiabacity is ensured if the highest frequency of the nuclear motion
is well separated from the lowest frequency associated with the fictitious motion of the electronic degrees of freedom
For systems with a finite gap Eg the parameter # can be used to shift the electronic frequency spectrum so that we > > 031 and no energy transfer between nuclear and electronic subsystems takes place. For metallic systems special variations of the original method have to be adopted [12]. In practice, it is easy to check if adiabatic conditions are fulfilled by monitoring the energy conservation of the physical Lagrangian in Eq. 6. Eqs 4 and 5 (or analogous first order equations) can be used fi~r a simultaneous optimization of electronic and nuclear degrees of freedom. They can also be used to generate classical nuclear trajectories on a quantum mechanical potential energy surface: after an initial optimization of the electronic wavefunctions for a given starting configuration, ionic and electronic degrees of freedom can be propagated in parallel along the Born-Oppenheimer surface. The Car-Parrinello method is similar in spirit to the extended Lagrangian methods for constant temperature [13] or constant pressure dynamics [14]. Extensions of the original
218
scheme to the canonical NVT-ensemble, the NPT-ensemble or to variable cell constant pressure dynamics [15] are therefore straightforward [16]. The treatment of quantum effects on the ionic motion is also easily included in the framework of a path-integral formalism [17]. Most of the current implementations use the original Car-Parrinello scheme based on DFT. The system is treated within periodic boundary conditions and the Kohn-Sham one-electron orbitals r are expanded in a basis set of plane waves (with wave vectors
G~). r -
1 eiO=.~. Vx/-~cen~I n c,m
(9)
up to a given kinetic energy cutoff Ecut. In such a scheme, an adequate treatment of inner core electrons would require prohibitively large basis sets. Therefore, only valence electrons are treated explicitly and the effect of the ionic cores is integrated out using an ab initio pseudo potential formalism. Due to the use of periodic boundary conditions, the treatment of charged systems needs special care. Different methods are available for this purpose [18]. Apart from the traditional scheme, Car-Parrinello approaches using semiempirical [19], nartree-Fock [19, 20] or GVB [21] electronic structure methods have been proposed and extensions to augmented plane wave [22] and hybrid basis sets of atom-centered basis functions and plane waves [23] have been implemented. Recently, Car-Parrinello schemes have also been extended into a mixed quantum/classical QM/MM approach [24]. If not mentioned otherwise, all the calculations presented in the next sections use the original Car-Parrinello scheme based on (gradient-corrected) density functional theory in the framework of a pseudo potential approach and a basis set of plane waves.
3
CAR-PARRINELLO CAL SYSTEMS
MODELING
OF B I O L O G I -
The exponential increase in computer power and the development of highly efficient algorithms has distinctly expanded the range of structures that can be treated on a firstprinciple level. Using parallel computers, AIMD simulations of systems with few hundred atoms can be performed nowadays. This range already starts to approach the one relevant in biochemistry. Indeed, some simulations of entire biomolecules in laboratory-realizable conditions (such as crystals or aqueous solutions) have been performed recently [25-28]. For most applications however, the systems are still too large to be treated fully at the AIMD level. By combining AIMD simulations with a classical MD force field in a mixed quantum mechanical/molecular mechanical fashion (Hybrid-AIMD) the effects of the protein environment can be explicitly taken into account and the system size can be extended. Even though it is possible to work with fairly sizeable quantum models, an intelligent choice of the crucial part of the system is still the basis of any successful modeling. The following different approaches have been applied so far: (1) AIMD simulation of the full system in laboratory-realizable conditions (e.g. in the crystal phase or in aqueous solution)
219 (2) AIMD calculations of carefully designed gas phase cluster models (3) AIMD simulations of gas phase cluster mode]is embedded in an external electrostatic field (4) Hybrid QM/MM Car-Parrinello simulations in which the quantum part is treated at the AIMD level and the surrounding is described with a classical force field
4 4.1
Applications to Non-Enzymatic Systems Nucleic Acids
RNA and DNA are in general very difficult to model with force-field based approaches. One major difficulty is to reproduce the backbone conformation (crucial for any modeling of nucleic acids), as the corresponding torsional energy barriers are very small [29]. First results from AIMD are encouraging: the calculated structure of a hydrated GpG RNA duplex in laboratory realizable conditions (that is, in the crystal phase)[25] showed excellent agreement with experiment and provided the H-bond network postulated by the crystallographers. Investigations of platinum-based drugs [30, 31] and their adducts with DNA fragments in the solid state [32] and in aqueous solution [26] confirm the reliability of an AIMD scheme to describe these systems even in the presence of transition metal ions, which (as mentioned in Section 1) are notoriously difficult to treat with effective potentials [33]. 4.2
Heme-Based
Proteins
AIMD has been used extensively to elucidate structure/function relationships in myoglobin and cytochromes. Calculations on myoglobin mimics [34-39], provided a picture of the binding mode of 02 and ligands such as CO and NO. These studies have also shed light on the intricate interplay between structural and bonding properties of the complex and environment and temperature effects. Two cytochromes have been studied so far. In case of the electron-transfer protein cytochrome c, electronic structure calculations helped clarify the intriguing nature of the Fe-S bond at the active site [40] whereas for cytochrome P450, steps of the enzymatic reaction were investigated [41-44]. The P450 family of' enzymes is involved in the metabolism of endogenous and xenobiotic compounds and this work can therefore be of potential use in toxicology research.
4.3
Cyclic Peptides
and Ion Channels
Up to date, two AIMD studies have been performed in this field. The first dealt with self-assembled polypeptides nanotubes [28]. These systems have a variety of potential application in biochemistry and material science, from optoelectronics to the construction of drug delivery vehicles. Calculations carried out on Cyclo[D-Ala-Glu-D-Ala-Gln]2 in the crystal phase showed also in this case excellent agreement with available structural data and provided novel information on intra- and intermolecular H-bond patterns.
220
In the second study, the proton diffusion through a polyglycine analog of the gramicidin channel was analyzed [27]. These simulations showed that the diffusion process is very rapid and furthermore, is assisted by the polypeptide backbone. Thus, the latter emerges as a key factor for rapid proton transfer through the channel. 4.4
Photosensitive
Proteins
A recent study has focused on structural and electronic aspects of a bacteriochlorophyl derivative (methyl bacteriophorbide) in the crystal phase [45]. The calculations are in good agreement with experimental data and provide evidence of a local structural change upon electronic excitation. AIMD simulations have also been carried out on the chromophore present in the rhodopsin photoreceptor (retinal). In the primary event of vision, retinal passes from the ground state (GS) to an excited state (ES) and isomerizes from 11-cis to all-trans within ~ 200 fs. A series of papers [46-50] have analyzed the GS isomerization process. More recently, calculations were extended to the first singlet ES [51] within a recently developed scheme for singlet state dynamics [52]. This work characterizes structural and energetic changes during the photoisomerization process and points to the crucial role of environment effects.
APPLICATIONS 5.1
TO E N Z Y M E S
Introduction
Understanding enzymatic function and mechanism at the molecular level is one of the most challenging and fascinating problems of biochemistry. Furthermore, it has direct implications in pharmacological intervention, as enzymes constitute the targets for a large variety of therapeutic agents. A modeling of such phenomena however, is a formidable task: an appropriate method should be able to treat fairly large systems of several thousand of atoms and take into account dynamical effects at finite temperature. Furthermore, for a direct investigation of the enzymatic mechanism of action, the modeling should also provide an adequate description of chemical reactions. AIMD simulations appear as a promising tool for a first-principles modeling of enzymes. Indeed, they enable in situ simulations of chemical reactions; furthermore, they are capable of taking crucial thermal effects [53] into account; finally, they automatically include many of the physical effects so difficult to model in force-field based simulations, such as polarization effects, many-body forces, resonance stabilization of aromatic rings and hydration phenomena. In this section, the power and the current limitations of AIMD in studying enzyme function is illustrated by a survey of selected recent applications. First, we present calculations on two very-well known enzymes, which are meant as benchmark studies for subsequent applications. Then, we outline application to pharmaceutical research and finally, we conclude this section by presenting state-of-the-art, QM/MM Car-Parrinello hybrid simulations on an enzyme relevant for synthetic and biotechnological applications.
221 5.2
Test Cases
To probe the capabilities of ab initio molecular dynamics (AIMD) in describing enzymatic reactions, calculations have been carried out on two text-book examples, human carbonic anhydrase II and serine protease. As these are among the most theoretically and experimentally characterized enzymes, this work has provided a basis for subsequent applications in the field.
5.2.1
Human Carbonic Anhydrase II (HCAII)
HCAII is a zinc-enzyme (260 amino acids, ~29 kD) which catalyzes the reversible hydration of CO2 to bicarbonate HCO~. The active site is located at the bottom of ,-.,15/~ deep conical cavity that is open towards the solvent. With a turnover rate at room temperature of ~ 106s-1, HCAII is one of the fastest enzymes known. X-ray structures show that the zinc ion is coordinated to three histidine residues (His94, His96 and Hisll9) and that a water molecule is bound to the zinc ion in an approximately tetrahedral arrangement. This water molecule has a pKa around 7-8 and can thus be easily deprotonated to OHunder physiological conditions. The zinc bound H:~O/OH- is connected via a hydrogenbonded network (H20/OH- --+ Thr199 --+ Glul06) to the rest of the protein. Another hydrogen-bonded network (H20/OH---+ HOH318--+ HOH292 -+ His64)extends from the zinc bound hydroxide/water via two solvent molecules to a histidine group located in the upper channel of the active site. The direct zinc ligands (His94, His96 and Hisll9) and the two residues involved in the hydrogen-bonded network around the zinc-bound water (Thr199 and Glul06) are conserved in all animal carbonic anhydrases [54] and site directed mutagenesis experiments have revealed the crucial importance of these residues for the activity of the enzyme by controlling a precise coordination geometry at the zinc center [55]. The catalytic reaction involves the steps of binding the CO2 via a nucleophilic attack of the zinc-bound OH-, conversion to HCOi~-, replacement of HCO~- by water and regeneration of the Zn-OH- through deprotonation of the zinc bound water molecule. The latter step constitutes the rate determining step [56] and most probably involves the histidine residue (His64) as a proton shuttle. Experiments estimate a free energy barrier of ~10 kcal/mol for the overall proton transfer reaction originating mainly from solvent reorganization or conformational changes while the intrinsic barrier for proton transfer could be as low as 1.25 kcal/mol [56]. Our goals in this study [57] are to (i) probe the influence of the size of the quantum cluster; (ii) establish the effect of the environment, i.e. compare cluster models with QM/MM hybrid models which include the electrostatic effect of the protein environment; and (iii) study the dynamical properties. We consider several different models of the active site: two ab initio cluster models of different size: MOD-A (,--,30 atoms) consists of a zinc-trisimidazole complex with a water or a hydroxide group as fourth ligand. MOD-B (~90 atoms) includes the tetrahedrally coordinated zinc center (Zn2+-H20/OH -, His94, His96, Hisll9) and the essential residues involved in the hydrogen bonding network (Thr199 and Glul06). The eight ordered water molecules resolved in the crystal structure that are within a distance of 7.~ from the zinc ion have also been included. The residues were fixed at a position close to the backbone and otherwise left free.
222
Figure 1: Graphical representation of model B. Atoms that are kept fixed during the simulations are indicated with a circle. Dummy hydrogen atoms are indicated with white balls. (Reproduced with permission from ref.[57], Copyright 1998 Am.Chem.Soc.) Dummy hydrogen atoms have been used to saturate the QM model where covalent bonds had to be broken to cut out the cluster model from the rest of the protein. Figure 1 shows a graphical representation of MOD-B. MOD-C is an extension of MODB that takes the electrostatic external field of the protein into account. The electrostatic background is represented by Gaussian broadened point charges located within a distance of 7.5-9/~ from the centre of the simulation box. We have tested charge sets from AMBER 4.0 [58] and GROMOS96 [59] force fields and have also probed the effect of different charge exclusion schemes. Our calculations show that the smallest size quantum model (MOD-A) does not provide an adequate description for neither structural, nor electronic or dynamical properties. In contrast, a cluster model of the size of MOD-B is able to reproduce the structural properties of the real system quite accurately and provides also a qualitative description of the electronic and dynamic features: S t r u c t u r a l P r o p e r t i e s . As an example for a characteristic structural property, the zinc-oxygen bond distance for models A and B is compared in Table 1. In the case of the zinc-trisimidazole complex (MOD A), the zinc-oxygen distance changes distinctly upon deprotonation of the zinc-bound water (Ad -- 0.32/~). Such a drastic change of the zincoxygen distance is not observed when comparing the crystal structures at low and high pH [60]. Apparently, in the real enzyme the protein environment helps to stabilize the zinc-oxygen distance during protonation/deprotonation. This shows clearly that such a simplified model is not able to capture the main structural features of the real enzyme.
223
T a b l e 1. Z i n c - O x y g e n D i s t a n c e s of Different M o d e l C o m p l e x e s
MOD A OH/HOH MOD B OH/HOH
BLYP
exp
1.91/2.23 1.94/2.02
2.05/2.051 2.05/2.051
exp: experimental values. All distances are given in Angstrom. 1values of the experimental crystal structures of the high and low pH forms of the enzyme [60].
In contrast, model B is able to retain the appropriate structure of the active site. In particular the zinc-oxygen distance in the hydroxide and in the water form are now similar (Ad = 0.08 .~).
Electronic Properties: Effects of the S u r r o u n d i n g . The proton affinity of the zinc-bound water molecule is a key property for the enzymatic mechanism. The acidity of the zinc bound water is the result of a subtle fine tuning via hydrogen-bonded networks and electrostatic environment effects. This quantity can thus serve as a sensitive indicator of differences in the electronic structure that will have a critical influence on the enzymatic reaction. As a first attempt to quantify the effect of the electrostatic environment and the varying size of the cluster model we have therefore calculated the proton affinities for the different models. The small cluster model A has a distinctly smaller proton affinity than the larger cluster model B. The inclusion of the environment does not change this value significantly. Point charge sets from two different force fields result in almost identical values even though the absolute magnitude of the specific charges differs in some cases appreciably. A large overpolarization effect is however induced if the 1-4 electrostatic interactions to the QM part are maintained. To the best of our knowledge, the proton affinity of HCAII is not known experimentally. The only experimental values available for a rough bracketing are the gas phase proton affinities of water (166.7 kcal/mol) [61] and OH- (390.8 kcal/mol) [62]. In view of this values, a proton affinity of 433 kcal/mol calculated without applying Coulomb exclusion rules at the QM/MM interface is clearly far too high. This indicates that point charges close to the QM part can induce a large over polarization and have to be treated with care. Table 2. P r o t o n Affinity of the Zinc-bound W a t e r BLYP
MOD A 184
MOD B 268
GROMOS 269
AMBER 268
AMBER inc 433
....
GROMOS: MOD B with point charge set of GROMOS96 (electrostatic 1-4 interactions to QM part are excluded); AMBER: MOD B with point charge set of AMBER 4.0 (electrostatic 1-4 interactions to QM part are excluded), AMBERinc: MOD B with point charge set of AMBER 4.0 (electrostatic 1-4 interactions to QM part are included). All energies are given in kcal/mol
224
Dynamical Properties: The Proton Transfer Reaction. We have investigated the dynamical properties of the two gas phase cluster models within the local density approximation (LDA). As for the structural properties, also the dynamical properties of the minimal cluster model A differ clearly from the real system. A short MD simulation (1 ps) at room temperature shows that the zinc-bound hydroxide can rotate around the Zn-O axis. This is in contrast to the real protein where the zinc-bound nucleophile is kept by the hydrogen-bonded network of Thr199 and Glul06 in a defined orientation [63] appropriate for the binding of CO2. Furthermore, the mobility of the imdazole rings is much higher than the ones of the corresponding histidine residues which are kept quite rigidly in place as indicated by T-factors of 5-10 reported for the crystal structure [62]. To investigate the dynamical properties of the larger cluster model B, we have performed a 1 ps MD simulation at body temperature. Being aware of the known deficiencies of the LDA in significantly underestimating proton transfer barriers, we used this simulation to make an efficient scan of the potential energy surface of the system 'with reduced barriers'. During these MD runs a spontaneous proton transfer reaction is observed. Starting from the hydroxide form of the enzyme a proton from a neighboring water molecule (HOH 318) is transferred to the zinc bound OH- and the charged defect can be transferred to the next water molecule by a further switch of a proton. In these proton transfer reactions a simultaneous shortening of several hydrogen-bonds (between the zinc-bound water, HOH 318 and HOH 292) occurs and protons can be exchanged easily back and forth via these three solvent molecules that form a kind of proton-exchange pathway. The water molecules involved in this process are indeed the ones connected in the real protein via a hydrogen-bonded network to the hypothetical proton shuttle group His64. Figure 2 shows the temporal evolution of the four oxygen-hydrogen distances (ZnHO...H-O-H(318)...OH2(292)) that form the proton relay. Two of these OH-distances correspond to covalent O-H bonds (as indicated in Figure 2 by OH distances of 1.0-1.2 /~) and two to hydrogen-bonded O...H distances in the range of ~ 1.6/~ (the hydrogenbonded O...H distances are somewhat shorter than what can be expected due to the overestimation of hydrogen-bonding within the LDA). It is apparent in Figure 2 that the monitored pairs of oxygen and hydrogen atoms can change their mutual distance from covalent to hydrogen-bonded and vice versa, i.e. the protons can be exchanged between two neighboring oxygen atoms. Prior to such a proton transfer, the OH distances involved in the relay adjust simultaneously to a similar value around 1.2-1.3/~ (corresponding to the symmetric position of the hydrogen between two oxygen atoms). Such a concerted change can be seen around 200, 340 and 440 fs. During our simulation, only the zinc bound water molecule exchanges its proton via this pathway and no other proton transfers were observed along different hydrogen-bonded networks. The findings of our simulations are thus in very good agreement with the proposed role of His64 as proton shuttle group. The fact that it is possible to observe directly part of the enzymatic reaction cycle is very encouraging. Our approach is completely bias-free in the sense that no knowledge about likely reactions or reaction coordinates is necessary. Such an unbiased approach seems particularly promising for the study of systems where the enzymatic reaction is not yet known in detail.
225
Figure 2: Temporal evolution of four characteristic oxygen-hydrogen distances involved in the proton relay. Note the simultaneous contraction of the O-H distances around 200, 340 and 440 fs prior to a proton transfer event. (Reproduced with permission from ref. [57], Copyright 1998 Am.Chem.Soc.)
5.2.2
Serine P r o t e a s e s
The serine proteases (SPR's) are one of the most studied enzyme families [64-74]. SPR's use the catalytic triad (Ser195-His57-Asp102) to catalyze the hydrolysis of peptides (Fig. 3a). This occurs through nucleophilic addition of the 3-hydroxyl group of Ser195 to the acyl carbonyl of the substrate, with formation of a negatively charged tetrahedral intermediate (Fig. 3(5)). Stabilization of the intermediate is achieved by formation of two H-bonds with the amide groups of Ser195 and Gly193 (mammalian isoenzymes [65]) or with the amide groups of Ser195 and the sidechain of Asn155 (bacterial isoenzymes [75]). Theoretical [76, 77] and experimental [75, 78] studies on wild type and mutants of a bacterial SPR (subtilisin) have shown that Asn155 is a key residue for the biological function, in that it provides a stabilization of the transition state (TS) relative to the ground state (GS) by as much as ~ 5 kcal/mol. Curiously, no correspondent studies on the mammalian isoenzymes have appeared to clarify the crucial role of Gly193. A second, important H-bond interaction involves two residues of the catalytic triad, His57 and Aspl02. A series of NMR studies on a mammalian [72-74,79] and bacterial [80] SPR's and their complexes with inhibitors have indicated the presence of a low-barrier hydrogen bond (LBHB) linking N61 of protonated His57 with the ~-carboxyl group of Asp102 (Fig. 3) [72-74,79]. Approaching of the TS is suggested to facilitate the formation of the LBHB,
226 S195 H ,"
R'
G193
~ ~ 0
/
H Q192
(a) D102
S195 ~ ~ / / N
,
---.2 H
/o
R 9
G193
11...-- \
/
Q192
s
(b) D102
Figure 3: Schematic views of the H-bond network in mammalian serine proteases active site (a) and of the adduct with the intermediate of the enzymatic reaction (b). In (b) the double arrow symbol refers to the a putative low-barrier H-bond.(Reproduced with permission from ref.citepapersp, Copyright 1999 Wiley.) which in turn may render N~2 of His57 a stronger base for accepting a proton from Ser195 in the formation of the intermediate [72-74,79]. As a result of this process, the free energy barrier of the TS relative to the GS could decrease (but this point is object of some controversy [81, 82]). To provide a picture of the chemical bonding in SPR's, and to relate it to the biological function, we have carried out ab initio molecular dynamics simulations on models of the SPR-intermediate (I-SPR) and the SeR-substrate complexes (S-SPR) (Fig. 4) [83, 84]. I - S P R d y n a m i c s . Consistent with NMR studies [79], proton hopping occurs between His57 and Aspl02 in the subpicosecond time-scale. Analysis of the chemical bonding indicates that the interaction is covalent in nature [83]. The second fundamental H-bond interaction investigated here involves Gly193 and the intermediate carbonyl oxygen. This H-bond is well maintained during the dynamics (average O - - - H distance of 1.7(0.1) h). A rough estimation of the interaction energy based on an electrostatic model [83] indicate that Gly193 stabilizes the intermediate by more than 10 kcal/mol (Tab. 3). This value appears to be too large for a hydrogen bond [85, 86]. Inspection of the structure reveals that the very large Q192G193 peptide's unit dipole (~4 D [67]) could be also an important factor for intermediate stabilization, as it points towards the negative charge of the intermediate. To extract the peptide dipole
227
Figure 4: Serine proteases: model complexes representing I-sPa ((a) and (c)), S.SPR ((b) and (d)). In (c) and (d) the Q192G193 peptide unit is replaced by dimethylammonia. H-bonds are depicted with dashed lines. Arrows indicate the scissile carbon atom C s. The latter is labeled only in (b) for clarity. (Reproduced with permission from ref. [83], Copyright 1999 Wiley.)
contribution from the total stabilization energy we constructed a second model complex in which the Q192G193 peptide unit is substituted by dimethylamine (II! in Fig. 4c). Tab. 3 shows that the resulting stabilization is much smaller, of the order of only few kcal/mol. Thus, we conclude that a large contribution of the transition state stabilization is due to charge-dipole interactions. S - s P a d y n a m i c s . The two key H-bond interactions are maintained but no proton transfer occurs. Interestingly, the substrate-protein interaction energy turns out to be much lower than that of the I - s P a complex (Tab.3). Table 3: Serine P r o t e a s e s Elec. AE (I-SPa) (Complex I) -12(4) -2.6 AE (I-SPa) (Complex III) AE (S-SPa) (Complex II) -6(2) -2.6 AE (S.SPa) (Complex IV)
B.E.
-4.2 -1.5
Tab.3 Intermediate- and substrate- Q192G193 peptide unit interactions in terms of electrostatic (Elec.) and binding energies (B.E.) (in kcal/mol). Replacing the Q192G19 peptide with dimethylammonia (complex IV) causes a drastic
228
decrease of the interaction energy. The latter turns out to be practically identical to that of complex I I I (Tab.3). We conclude that H-bond interactions are similar in the S-SPR and I-SPR complexes. In contrast, the electrostatic (charge-dipole) interactions are very different, the I- SPR being more stable by ~ 6 kcal/mol with respect to S-SPR (Tab.3). For these complexes it has been possible to calculate also the binding energies. Tab.3 shows a qualitative agreement between binding and electrostatic energies. The result validates the use of the electrostatic model for a qualitative analysis of intermolecular interactions, as it has been done in this work. Our calculations are completely consistent with and confirm the existence of a LBHB between His57 and Asp102, which has been observed experimentally in transition state analog inhibitor complexes [72-74,79]. Furthermore, they strongly support the proposal of an LBHB-facilitated mechanism [79], as the LBHB is essentially covalent in nature. Thus, the energy supplied by covalent interaction may be crucial to overcome the energy loss due to the compression of the two residues, which is a prerequisite for the postulated LBHB-based reaction [79]. The second conclusion is that the rather large, Gly193-induced stabilization of the transition state with respect to the ground s t a t e / s not caused by an H-bond with Gly193, as commonly proposed [65, 66]: indeed, the H-bond favors the binding of both substrate and intermediate by ~ 2.6 kcal/mol, a value typical of a strong H-bond in biological systems [86]. Instead, the negatively charged transition state turns out to be more stable relative to S-SPR by several kcal/mol as a result of the interaction of the negative charge with the large dipole of the Q192G193 peptide unit. A simulation in which dimethylammonia replaces the Q192G193 peptide unit confirms the crucial role of the dipole: the absence of the stabilizing charge-dipole interaction renders the intermediate species unstable. These considerations suggest that site-directed mutagenesis experiments on the 192 and/or 193 positions might affect significantly the activity of SPR's, as the Q192G193 dipole orientation may no longer be optimal for transition state stabilization. 5.3
Enzymes
As Targets
for Pharmaceutical
Intervention
Molecular dynamics calculations based on force-fields are a fundamental tool for designing new and more powerful drugs for specific molecular targets [87]. Based on the largenumber of 3D biological structures available today, these calculations have led to major advances in our understanding of macromolecular structures, molecular similarity and in the identification of pharmacophores. However, the force-field approach is not devoid of problems, which lie in the intricate physico-chemical nature of the intermolecular interactions. Indeed, it is becoming increasingly clear, from both experiment and theory, that electronic structure effects may play a crucial role in ligand-receptor interactions and enzyme-inhibitor binding. Examples in this respect include bond- forming- bond-breaking processes such as low-barrier hydrogen bonds and charge transfer and polarization effects. All these phenomena are more reliably described by ab initio quantum-chemical methods. In this respect, AIMD presents itself as a promising new tool. In the next paragraph, we describe our work on the main targets for therapeutic intervention in AIDS, the enzymes protease and reverse transcriptase from human immunodeficiency virus type 1 (HIV-1 PR and HIV-1 RT). Subsequently, we focus on an
229
enzyme of relevance for anticancer research.
5.3.1
HIV-1 P r o t e a s e (HIV-1 P R )
HIV-1 PR cleaves the multidomain protein encoded by the virus genome to yield separated structural proteins. Structure-based drug-design studies have shown that in the substratecleavage s i t e - two Asp-Thr-Gly loops at the subunit-subunit interface (Fig. 5a) - the almost coplanar conformation of the catalytic Asp dyad is crucial for enzymatic function and for the binding of both substrate and inhibitors [88-90].
Figure 5: (a) Structure of HIV-1 PR [103] and its cleavage site; (b) models used for the ab initio molecular dynamics of the mono-protonated form. (Reproduced with permission from ref. [100], Copyright 2000 Wiley.) Based on these structures, force-field based molecular dynamics (MD) simulations have been used to probe the binding of novel ligands [91-99]. However, these approaches have encountered difficulties in adequately describing interactions of the catalytic aspartyl pair [91-99]. As a result, ad hoc assumptions have often been introduced in the calculations. Among these are (i) the choice of charge distribution [95]; (ii) the application of geometric constraints between the carboxylate moieties [96, 97] and (iii) the positioning of the proton midway between the two adjacent Asp groups [99]. These a posteriori models, therefore, do not provide the physico-chemical origin of the stability in the active site. Quantum-mechanical approaches appear ideally suited to provide an understanding of the underlying molecular interactions of the Asp dyad. Here, we present results from our ab initio MD simulations [100]. This investigation, which focuses on the free enzyme, is divided in two steps. First, we attempt to determine the protonation state of the Asp
230
dyad [101, 102]. Then, we study the conformational flexibility of the Asp dyad of HIV1 protease on models of increasing complexity, including also the protein electrostatic potential. Our model complexes of HIV-1 PR active site (Fig. 5b) were constructed starting from the structure of the free enzyme [103]. From the X-ray structure, it has been inferred that a water molecule bridges the two Asp groups (Wat_b hereafter) even though its exact location has not been provided. We positioned Wat_b so as to form the H-bond patterns already proposed for the eukariotic isoenzyme [104] and added two other water molecules putatively present in the active site channel which interact with the Asp dyad. P r o t o n a t i o n S t a t e . At optimal pH for enzymatic activity (N 5-6) [101, 102, 105], the Asp dyad can in principle exist in three protonation states, a deprotonated, a monoprotonated or a doubly protonated form. Because hydrogen atoms are invisible in the X-ray structure, evidence for a specific protonation state must be inferred indirectly by spectroscopic or titration measurements. Up to now, the existence of the doubly protonated, neutral form has not been proposed for the free enzyme. The existence of the deprotonated, doubly negative form is supported by a recent NMR study [102] at pH 6. However, this study has been subjected to criticism [106] and it is not conclusive. Our ab initio simulations of this form show that the Asp dyad is unstable even in the ps timescale because of the strong Asp-Asp repulsion, which turns out to be N +30 kcal/mol (as estimated with a simple electrostatic model [100]). Thus, our calculations do not support the existence of this form. The third possible state is the mono-protonated one, which has been strongly suggested from both experiment and theory [101, 106]. The ab initio energy minimizations performed on relatively large models of the two protomers C and B indicate that [100]: (i) C is lower in energy than B by 1.1 kcal/mol; (ii) the conformation of C is close to the X-ray structure but that of B is not; and (iii) the location of Wat_b is close to the observed electronic density in C but not in B. Inclusion of additional water molecules of the active site channel is expected to stabilize further protomer B relative to C because Wat_b can form additional H-bonds in the latter but not in the former. In conclusion our calculations provide strong evidence in favor of protomer C in free HIV-1 PR and all subsequent calculations have been done on complex C or its derivatives. Simulations of t h e Cleavage Site w i t h M o d e l s of I n c r e a s i n g Levels of C o m plexity. The ab initio MD simulation of the simple Asp dyad - Complex C(I) (Fig. 5b) - demonstrates a hopping of the aspartyl proton between the oxygen atoms already on the subpicosecond time-scale (Fig. 6b): the two O51 atoms oscillate around a very short equilibrium distance. The presence of this low-barrier hydrogen bond (LBHB) confirms previous findings based on quantum dynamical studies [107]. During the dynamics, the LBHB compensates for the strong repulsion between the two Asp OJ1 atoms (O51-Otil average distance 2.5(0.1)/~), which is consistent with the suggestion that this type of interaction can provide several kcal/mol of stabilization energy [74, 108]. While the LBHB is maintained, the coplanarity is completely disrupted (Fig.6b). We conclude that the LBHB bond is able to keep the proton-sharing oxygen atoms close to each other but the repulsion of the other oxygens of the carboxylates renders the
231
A~sp2S
Asp2S 9.
,o
42
44
Asl~
Asp2S"
,~
Time (lOS)
Figure 6:HIV-1 protease: Ab initio molecular dynamics of complexes C(I), C(II) and C. (a) Location of the proton; (b)-(d): (Left) O61... H distances plotted as a function of time and (right) final (thick line) and starting (thin line) structures of complexes C(I) (b), C(II) (c) and C (d). In (d)(left) only the last 0.9 ps are shown for the sake of clarity. (Reproduced with permission from ref. [100], Copyright 2000, Wiley.)
system unstable. Inclusion of hydration and the hydrogen bond interaction with the glycine residuesComplex C(II) (Fig. 5b) - is not sufficient to produce a stable conformation: besides the loss of the characteristic orientation of the Asp groups, also the Asp-Asp hydrogen bond is disrupted (Fig. 6c). Inspection of the X-ray structures of unbound and complexed HIV-1 PR [103,109-115] offers an explanation for the instability of the system: in all the structures investigated, the rather rigid Gly amide groups do not form an optimum hydrogen bond with the Asp groups, /(N-H---O51) ranging from 125 to 153~ Thus, the carboxylate groups rearrange unphysically to maximize H-bond stabilization (maximum /(N-H---O51) = 179~ so as to remove the aspartyl hydrogen bond. Thus, we conclude that H-bonding to Gly27(27') is not an essential factor for the stability of the Asp dyad. What then are the key interactions stabilizing the conformation that is found in the experimental structures? A detailed inspection of the active site suggests the strong dipoles of the Thr26(26') Gly17(17') peptide units as important factors, as they point towards the negative charge of the Asp dyad. Indeed, calculation of the quantum-mechanically derived electrostatic potential of the aspartyl dyad reveals a striking alignment of the peptide unit dipoles with the Asp charge (Fig. 7). The resulting electrostatic interaction turns out to be rather large (an estimate from a point charge model is-7.8 kcal/mol [100]). The simulation, where the peptide link is included - Complex C (Fig. 5b) - confirms
232
Figure 7:HIV-1 PR: Thr26--Gly27 peptide unit's dipole (calculated and experimental (Nelson RD, et al. Nat'l. Bur. Stands. 10, 1967) values 3.82 D and 3.84 D, respectively) superimposed on the electrostatic potential of the Asp dyad active site. The coloring varies continuously from red in negative areas to blue in more positive regions.(Reproduced with permission from ref. [100], Copyright 2000 Wiley.)
the fundamental role of the charge-dipole interactions. Indeed, the system turns out to be stable over the relatively long time range explored (over 4.5 ps): (Fig. 6d) the coplanarity of the Asp dyad and the dipole-charge interactions are well maintained and proton hopping between the carboxylate groups is observed. Our calculations provide no support for the existence of a deprotonated form at pH 56 while they show that the mono-protonated s t a t e - in which the Asp dyad shares one proton- is rather stable, in agreement with previous findings [101, 106]. In the most stable protomer C, a water molecule forms two H-bonds to the Asp carboxylates. The close proximity of the two carboxylates is achieved by forming a LBHB which overcomes the repulsion of the two negative residues [74, 108]. The peculiar orientation of the two Asp residues is obtained through the interaction of the aspartyl negative charge with the rather rigid Thr26(26')-Gly 27(27') peptide units' dipole. Recent site-directed mutagenesis experiments on the 27, 27' position, which show the complete loss of catalytic power in the G27V, G27'V HIV-1 PR mutant [116], are consistent with the crucial role of this dipole. Indeed, replacing glycine with the bulk side-chain of valine may cause a significant rearrangement of the backbone and thus of this dipole. This in turn may stabilize a conformation of the Asp dyad which is not optimal for the catalytic action of the enzyme.
233
The ab initio MD simulations indicate that several ingredients, such as polarization forces, the treatment of bond-forming/breaking processes and temperature effects, play a crucial role in the HIV-1 PR active site. These key features are expected to play a critical role also in the adducts with the substrate and inhibitors. Based on these findings, specific force-fields could now be developed for this system, which in turn might allow for a more accurate modeling of HIV-1 PR - drug interactions. 5.3.2
HIV-1 Reverse
Transcriptase
Drug effectiveness in anti-AIDS therapy is severely limited by the capability of the virus to develop mutations which ultimately lead to drug: resistance [117, 118]. The spectrum of alterations is rather broad for both HIV-1 PR and and reverse transcriptase (RT), as evidenced by genetic and biochemical studies performed in the laboratory or in clinical trials [119, 120]. Single mutations effective against drug action are usually accompanied sequentially by 3 - 4 additional mutations so that several highly resistant mutation patterns are observed. Thus, understanding how mutations exert their effects on drug-resistance at a molecular level can ultimately lead to the design of new drugs and therapeutic strategies more effective against AIDS.
Figure 8:HIV-1 RT: Nucleotide binding site (right) and proton transfer between 7-phosphate and Lys65, superimposed with the electron localization function (ELF) (Silvi B e t al, Nature 1994; 371:683) (left). The ELF is represented in a best-fit plane containing the oxygen, the proton and the lysine nitrogen. Red areas indicate strong localization of the electronic density.
234
The recent determination of the crystal structure of a ternary catalytic complex of HIV-1 RT with a substrate (dTTP) and the DNh-primer and template [121] (Fig. 8) has provided the structural basis of resistance: it has been found that most mutations causing resistance to nucleoside-analog drugs are located closely to the nucleoside binding site. AIMD calculations were used to characterize the functional role of these residues involved in resistance against nucleoside-analog drugs [122]. Calculations were carried out for models of the nucleoside binding site in different protonation states of the substrate triphosphate (fully deprotonated and protonated in the q-position). While the protonated form experiences large rearrangements already in the ps time scale, the fully deprotonated form exhibits a previously unrecognized low-barrier hydrogen bond (LBHB) between Lys65 and ~ -phosphate (Fig. 8). The probable loss of this interaction in K65R HIV-1 RT may be a key factor of the well-known resistance of this mutant for nucleoside analogs (such as ddI, ddC and 3TC) [123]. Water molecules (not detected in the X-ray structure) form a structured H-bond network at the active site. A well-ordered water molecule emerged as key factor for substrate recognition by bridging Gin151 and Arg72 with the 7-phosphate. In the Q151M HIV-1 RT mutant, which exhibit cross resistance towards dideoxy-type drugs and AZT [124], loss of Gin151- water H-bond is expected to destabilize the water position and therefore could affect substrate binding and drug resistance. 5.3.3
Herpes Simplex Virus T y p e 1 T h y m i d i n e Kinase: a Target for GeneTherapy Based Anticancer Drugs
Viral herpes simplex type 1 thymidine kinase (HSV1 TK) is a key enzyme in the metabolism of the herpes simplex virus. Its physiological role is to salvage thymidine into the DNA metabolism by converting it to thymidine monophosphate: ATP + d ( T ) - + ADP + d(Tp) Phosphorilization is achieved by transfer of the q-phosphate group from ATP to the 5'-OH group of thymidine. Understanding the chemistry of this enzyme is important for applications in the treatment of virus infections and for cancer chemotherapy [125-132]. Recently, we have performed an ab initio MD study that has focused on the HSV1 TK nucleoside interactions [133]. Our goal has been to gain a better understanding of the nature of HSV1 TK binding interactions and of its mechanism of action. Our complexes are based on the X-ray structure of the substrate-enzyme adduct [134]. They include residues fixing the thymine ring (Met128 and Tyr172); the guanidinum group of Arg163, represented by an ammonium ion, is also included because of its important electrostatic role (Fig.9). Several HSV1 TK-thymine complexes have been considered by protonating the residues and the substrate differently. The ab initio MD simulations show that all the complexes investigated are stable in the ultrashort time-scale investigated. We study the binding by calculating the density difference Ap : P c o m p l e x - P f r a g m e n t s - Psubstrate, which describes how the electron density p changes during the formation of the complex. Inspection of the Ap for all complexes reveals that no charge transfer from or to the substrate is present (Fig. 10). The O and
235
Figure 9: HSV-1 TK nucleoside binding site (left) and (right) quantum-mechanical model used in the calculations. (Reproduced with permission from ref. [133], Copyright 1998 Wiley.) N atoms of thymine as well as the Arg163-Tyr172 H-bond are significantly polarized. Thus, the tyrosine ring appears to polarize the nucleobase, indicating that T y r 1 7 2 - T electrostatic interactions play an important role in the binding. This result is consistent with biological data on Y127F HSV1-TK mutant: indeed, the latter exhibits very small enzymatic activity [135]. In contrast, there is no evidence of polarization effects on the Met128 sulfur atom [136]. This indicates that sulfur plays only a minor role in binding. That the role of Met128 sulfur in the binding process is purely hydrophobic and steric has been confirmed by very recent site-directed mutagenesis experiments, which have shown that the activity is preserved when the Met residue is replaced by another hydrophobic residue such as Ile [135]. Work is in progress to study the binding of sugar-like chains of fraudolent substrates. The calculations point to a critical role of electrostatic interactions, providing a rationale for enzyme kinetics measurements performed in the lab of Prof. Folkers at the ETH in Zurich. 5.3.4
Conclusion
In conclusion, this type of quantum chemical calculations reveal a variety of functionally important characteristics of drug/target interactions, which can neither be discerned by visual inspection of the molecular structure nor be described by standard force-fields. Several ingredients, such as polarization forces, treatment of bond-breaking processes in the LBHB and temperature effects do play a critical role for drug binding and possibly for drug- resistance mechanisms. Implementation of such ingredients in standard force fields and in QSAR parameters is expected to result in a more efficient design of new drugs for these targets.
236
Figure 10: Electronic density difference in HSV-1 TK nucleoside binding site: Magenta:-0.054 e/A s, green 0.054 e/A a . (Reproduced with permission from ref.[133], Copyright 1998 Wiley).
5.4
Rational Design of Biomimetic Catalysts by Hybrid Q M / M M Car-Parrinello Simulations of Galactose Oxidase
In millions of years of evolution, nature has developed a remarkably elegant and subtle in vivo chemistry. Reactions are generally performed under very mild conditions, with high efficiency and (stereo) selectivity. It is therefore not surprising, that a lot of research effort is devoted to the understanding of the principles governing enzymatic catalysis and the development of small synthetic compounds that would be able to mimick the natural chemistry [137]. However, the search for simple synthetic models is difficult and only very few functional biomimetic compounds exist so far. One of the factors, that hampers the successful design of synthetic analogs is the great complexity of biological systems which makes it almost impossible to pinpoint all the important factors of the active site that have to be included in a biomimetic analog. An accurate and realistic computer modeling of the enzymatic process could in principle be used to map out these crucial factors. In computer experiments the influence of different residues in the active site can be probed easily and environment and temperature effects can be assessed. To probe the capabilities of
237
Y495 H~i
~H581
HzO~Y272 "" C228 (a)
(b)
Figure 11: (a) Schematic representation of the active site of Galactose Oxidase (GOase) in comparison with the biomimetic model compound [156] (b). (Reproduced with permission from ref. [159], Copyright 2000 Springer.) AIMD for this purpose we have chosen the mononuclear copper enzyme galactose oxidase (GOase, 68 kD, 639 amino acids) for which recently functional biomimetic models have been synthezised (see also the chapter on radical enzymes by F. Himo and L. Eriksson in this book). GOase is an extracellular enzyme secreted by the fungus Dactylium dendroides that oxidizes primary alcohols to the corresponding aldehydes under simultaneous reduction of molecular oxygen to hydrogen peroxide [138]. This reaction is performed for a wide range of substrates with strict regio and stereo selectivity. Properties which render this system of considerable interest for bioanalytical [139] and synthetic applications. The X-ray structure was solved in 1991 [140, 141] and showed that the Cu 2+ ion is coordinated by two nitrogens and two oxygen atoms from aromatic residues (His496, His581, Tyr272 and Tyr495) and an external fifth ligand (water or acetate) (Figure lla). No other cofactor is present in the structure that could provide the second redox equivalent for the catalyzed 2-electron oxidation. However, one of the tyrosine ligands (Tyr272) forms a very unusual covalent thioether linkage with Cys228 indicating that Tyr272 might provide the second redox center by forming a free radical stabilized via delocalization to Cys228. This ligand-based radical mechanism has been confirmed by EPR-measurements characteristic of a Cu(II)- site close to an organic radical and by the EPR-spectrum of the apo enzyme [142-144]. Many biomimetic model compounds have been designed for GOase [145-157]. In spite of a high similarity of structural and/or magnetic properties most of these synthetic analogs show no catalytic activity. Very recently however, two groups have succeeded in synthezising functional models of GOase [156, 157]. These new biomimetic compounds enable a novel synthetic route for the conversion of primary alcohols to aldehydes and they also constitute well-defined model systems for an investigation of the underlying reaction mechanism. However, the synthetic models exhibit a reactivity that is several orders of magnitude below that of GOase. This drastic difference calls for an approach that would allow an identification of the essential factors governing the enzymatic reaction that are still at miss in the mimetic system.
238
,oo'4~)
0HI@0~N(11581)
Semi-
...
......' HzO;"
A Semi
oo'2r~)
I
O'4Q5)
0(Ym~
C)X~/Ze~
~)~e~(H~. ) A . X ,o ...........
.'~ 2;='2)
HzO"
0
+HzO 0
(Hma)N
I
~)N
D
+RCHzOH~-
N(Hr~)
HO~'" :%272) ~'~'~H " R
HzO2
HOOm~) C N(H~D
H0(Y~
(H4~)N
I(II~81)
HO(Y4e~) (Hm)N~~ D
B
I
H0(Y406) (~~u~
N(H581)
C
Figure 12: Schematic representation of the proposed catalytic cycle [138]. Labels refer to: A: resting state; B: protonated intermediate; C: transition state of the H-abstraction step, D: product of the abstraction step. semi: semi reduced form, ox: oxidized form. (Reproduced with permission from ref. [159], Copyright 2000 Springer.) We have performed a parallel theoretical study of the enzyme and one of its synthetic analogs [156] (Figure 11) aimed at the characterization of the main catalytic differences [158, 159]. Several key structures of the catalytic cycle (Figure 12) have been investigated in direct comparison with the natural target.
239 To capture the enzymatic system in its full complexity, we have adopted a mixed quantum/classical QM/MM Car-Parrinello approach [160] in which the active site residues (Figure 11a) are treated quantum mechanically (within the framework of density functional theory) and the rest of the protein is described with an empirically:derived force field. In contrast to pure gas phase models of the active site, such an approach allows to assess the influence of the protein environment and to capture finite temperature and solvent effects. We have confronted the two systems during the catalytic cycle (Figure 12) by characterizing the semi reduced and the oxidized form of the resting state, A semi and A ~ the protonated intermediate B, the transition state for the rate determining hydrogen abstraction C and the final product of the abstraction step D. We find that the overall features of the mimetic ,compound are qualitatively remarkably similar to the ones of its natural target. For both systems, the semi reduced resting state A semi is characterized by an unpaired electron localized in a dx2_y2 orbital at the Cu(II)center (Figure 13) while the catalytically active species A ~ (Figure 14), the protonated intermediate B (Figure 15), and the transition state of the hydrogen abstraction step C (Figure 16) form antiferromagnetically coupled diradical states. In A ~ B and , one electron remains localized on the Cu(II)-ion whereas the localization of the second electron of opposite spin varies several times throughout the cycle. All variations of the fl-spin distribution from a localization on the axial tyrosine Tyr495 in A ~ to the equatorial tyrosine Tyr272 in B and to a localization on the alcohol substrate in C are closely matched by the synthetic active site analog. However, we have also found a number of intrinsic differences between natural and synthetic compound that can be summarized as follows: (i) Throughout the catalytic cycle the active site of GOase undergoes only very small geometric changes. The RMS deviations of all the investigated structures A-D is smaller than 0.01/~ for all the ~ 70-80 atoms of the active site quantum region. In the biomimetic system on the other hand, at least two significant structural rearrangements occur; one upon substrate binding and another one in the product formation of the abstraction step. (ii) Substrate binding in the mimetic system seems to be hampered by alkyl residues of the thioether groups in ortho-position of the equatorial oxygen ligand. Considering the fact that the alcohol substrate is only weakly bound prior to deprotonation, the energy needed to induce a conformational change disfavors the formation of the substrate complex additionally. (iii) Adjacent oxygen and nitrogen containing aromatic ligands of the biomimetic compound form an angle of 50~ in the resting state. However, protonation of the axial ligand and the formation of the product D favor the formation of an extended conjugated system in which both ligand systems are essentially coplanar. This energetically favorable competitive configuration leads to large structural changes and induces the formation of a linear NCu(I)O-product in which the aldehyde substrate is tightly bound and cannot be released as easily as in the corresponding weakly bound GOase-analog (Figure 17). (iv) The activation barrier we calculate for the natural system is 16 kcal/mol in close agreement with a value of 14 kcal/mol estimated from the experimental turnover rate of 800s-I [161].
240
Figure 14: Contour plots of the unpaired electron density distribution in the oxidized form of the resting state of (a) GORse and (b) the biomimetic compound. Contours are drawn at 0.002 e/au 3. Yellow and magenta refer to a- and ~-spin densities, respectively. (Reproduced with permission from ref. [159], Copyright 2000 Springer.)
Figure 13: Contour plots of the unpaired electron density distribution in the semi reduced form of the resting state of (a) GOase and (b) the biomimetic compound (contour at 0.02e/au3). (Reproduced with permission from ref. [159], Copyright 2000 Springer.)
241
Figure 15: Comparison of the unpaired electron density distribution of the protona ted intermediate B of (a) GOase and (b) the biomimetic compound (contour at 0.008 e/au3). Yellow and magenta refer to a- and t3-spin densities, respectively. (Reproduced with permission from ref. [159], Copyright 2000 Springer.)
Figure 16: Comparison of the unpaired electron density distribution in the transition state for hydrogen abstraction (C) for GOase (a) and the biomimetic compound (b). Contours are drawn at two different levels: 0.0015 e/au 3 (upper half of Figure 16) and 0.001e/au 3 (lower half). Yellow and magenta refer to a- and j3-spin densities, respectively. (Reproduced with permission from ref. [159], Copyright 2000 Springer.)
242
Figure 17: Structure of the product D of the abstraction step for (a) GOase and (b) the biomimetic compound. The long coordination bonds to Tyr495 and the substrate are indicated in dashed lines. (Reproduced with permission from ref. [159], Copyright 2000 Springer.) The corresponding value for the synthetic system instead is with 21 kcal/mol distinctly higher, consistent with its much lower catalytic activity (turnover numbers for aromatic substrates are ~ 0.02s -1). In both systems, the second unpaired electron, which is localized on the equatorial oxygen ligand in B is here mainly located on the substrate itself (Figure 16 upper half). This finding offers a first explanation for the experimental fact that GOase is several orders of magnitude more efficient in the conversion of aromatic as compared to aliphatic substrates and that the synthetic system only converts aromatic but not aliphatic alcohols [156]. The strong concentration of the unpaired spin density on the alcohol substrate in the transition state suggests that the experimentally observed differences in reactivity are caused by the fact that aromatic substrates form more stable radical intermediates due to the additional
243
delocalization of the unpaired electron density. A closer inspection shows that for C the unpaired spin density on the substrate is smaller in the natural system (0.6e) than in its mimic (0.7e). In fact, in GOase the unpaired/~-spin density is delocalized to some extend over the equatorial tyrosine and the covalent sulfur link whereas at the same contour level no net spin population on the equatorial ligand of the mimetic system. For the synthetic analog, the integrated unpaired spin density is lower than 0.01e for any atom of the equatorial ligand system while corresponding values in GOase range typically from 0.01-0.02e/atom. The total unpaired electron density of the equatorial ligand is roughly twice as large in the natural compound which provides a first rationale of the discrepancy in barrier height. The sulfur-containing ligand has almost no radical character in the biomimetic, in contrast to the natural system. This agrees with the experimental observation that the covalent sulfur link plays an important role for the catalytic function of GOase [162] whereas sulfur-substituents have only a small or no effect for the synthetic compounds [150, 156]. The subtle electronic differences between natural and synthetic system is caused by a particular variance in the geometric properties. All the essential orbitals hosting the unpaired /%spin density are coplanar with the dx2_y~ orbital at the copper in both, natural and mimetic system. However, due to the perpendicular orientation of Tyr272 the pz-orbitals of the aromatic system and the covalently linked sulfur atom can easily overlap with these orbitals on the former while due to the different orientation of the equatorial ligand, they are orthogonal in the latter. We have performed a series of computer experiments to evaluate decisive factors involved in the enzymatic catalysis. The protein field outside the quantum region has only a relatively small effect. Most of the crucial properties seem to be determined by the geometric and electronic features of approximately 100 atoms of the active site. Thus, it should be indeed possible to construct small synthetic analogs that can mimic the enzymatic chemistry with high fidelity. Our study provides direct mechanistic information that can help in the future design of GOase mimics with increased efficiency or selectivity.
6
OUTLOOK
This review has shown the power of AIMD to describe biochemical problems. Complex enzymatic processes (such as catalytic reactions and binding of drugs) can be followed directly at the molecular level and many valuable insights can be gained from such in situ studies. AIMD and Hybrid/AIMD simulations certainly constitute a promising novel tool for an ab initio modeling of biological processes. However, due to the great complexity of the systems, technical and fundamental reasons still limit the domain of applications. The system size problem necessitates mixed QM/MM approaches which in the future might be accompanied by linear scaling approaches. However, the most severe of the remaining limitations is the time scale of a few tens of picoseconds during which the system can be sampled. Therefore, the combination of AIMD and Hybrid/AIMD simulations with enhanced sampling techniques [163] can be expected to multiply the power of this approach. The fast ongoing progress in the development of new algorithms and computer archi-
244
tectures makes us confident that AIMD and Hybrid/AIMD methods will be able to add a new dimension to the simulation of biological processes.
Acknowledgments. It is a pleasure to thank all the people who have contributed to this review, in particular Frank Alber, Karel Doclo, Stefano Piana, Lorenzo De Santis and Marialore Sulpizi. We also acknowledge fruitful collaborations with Wanda Andreoni and Gerd Folkers. We are indebted to Erio Tosatti and Michael Klein for many useful discussions. Finally, we would like to thank Michele Parrinello for his continuous support.
References [1] Car R, Parrinello M, Phys Rev Lett 55:2471 1985 [2] Ballone P, Andreoni W, Car R, Parrinello M, Phys Rev Lett 60:271-274 1988 [3] R5thlisberger U, Andreoni W, J Chem Phys 94:8129 1991 [4] Car R, Parrinello M, Phys Rev Lett 60:204-207 1988 [5] See, e. g. (a) Nusterer E, Sl5chl PE, Schwarz K, Angew Chem Intl 35:175 1996 (b) Charlier JC, De Vita A, Blase X, Car R Science 275:646 1997 [6] See, e g (a) R5thlisberger U, Klein ML, J Am Chem Soc, 177:42 1995; (b) R5thlisberger U , Sprik M, Klein ML, J Chem Soc Faraday Trans 94:501 1998; Doclo K, R5thlisberger U, Chem Phys Lett , 297:205 1998 [7] See, e g , (a) Boero M, Parrinello M, Terakura K, J. Am. Chem. Soc. 120:2746 1998; (b) Hass KC, Schneider WF, Curioni A, Andreoni W, Science, 282:882 1998; (c) Boero M, Parrinello M, Hiiffer S, Weiss H, J. Am. Chem. Soc. 122:501 2000 [8] Sprik M, Hutter J, Parrinello M, J Chem Phys 105:142 1996 and references therein [9] See e g (a) Molteni C, Parrinello M, J Am Chem Soc 120:2168 1998; (b) Brug@ F, Bernasconi M, Parrinello M, J. Am. Chem. Soc. 121:10883 1999; (c) Alber F, Folkers G, Carloni P, J Mol Structure (Theochem), 489:237 1999 [10] See, e g (a) Curioni A, Sprik M, Andreoni W, Schiffer H, Hutter J, Parrinello M, J Am Chem Soc 199:7218 1997; (b) Meijer EJ, Sprik M, J Phys Chem A, 102:2893 1998; (c) Meijer EJ, Sprik M, J. Am. Chem. Soc. 120:6345 1998 [11] Carloni, P , B15chl, PE, Parrinello M, J Phys Chem 99:1338 1995 [12] B15chl PE, Parrinello M, Phys Rev B 45:9413 1992; Kresse G, Hafner J J Non Cryst Solids 156-158:956 1993; Alavi A, Kohanof J, Parrinello M, Frenkel D, Phys Rev Lett 73:2599 1994; VandeVondele J, DeVita A, Phys Rev B 60:13241 1999 [13] Nose S, Mol Phys 52:255 1984; Hoover WG, Phys RevA 31:1695 1985 [14] Melchionna S, Ciccotti G, Holian BL, Mol Phys 78:533 1993
245
[15] Parrinello M, Rahman A, Phys Rev Lett 45:1196 1980 [16] Focher P, Chiarotti GL, Bernasconi M, Tosatti E, Parrinello M, Europhys lett 26:345 (1994); Bernasconi M, Chiarotti G1, Focher P, Scandolo S, Tosatti E, Parrinello M, J Phys Chem Solids 56:501 1995; [17] Marx D, Parrinello M, Z Phys B 95:143 1994; Marx D, Parrinello M, J Chem Phys 104:4077 1996; Tuckerman ME, Marx D, Klein ML, Parrinello M, J Chem Phys 104:5579 1996; Martyna G J, Hughes A, Tuckerman ME, J Chem Phys 110:3275 1999 [18] Blochl PE, J Chem Phys 103:7422 1995; Marx D, Fois E, Parrinello M, Intl J Quant Chem 57:655 1996; Martyna GJ, Tuckerman ME, J Chem Phys 110:2810 1999 [19] Hammes-Schiffer S, Andersen HC, J Chem Phys 99:523 1993 [20] Hartke B, Carter EA, Chem Phys Lett 189:358 1992 [21] Hartke B, Carter EA, J Chem Phys 97:6569 1992 [22] Blochl PE, Phys Rev B 50:17953 1994 [23] Lippert G, Hutter J, Parrinello M, Mol Phys 92:477 1997 [24] Woo TK, Margl PM, Blochl PE, Ziegler T, J Phys Chem B 101:7877 1997; Eichinger M, Tavan P, Hutter J, Parrinello M, J Chem Phys 110:10452 1999 [25] Hutter J, Carloni P, Parrinello M, J Am Chem Soc 118:8710 1996 [26] Carloni P, Sprik M, Andreoni, W J Phys Chem lq 104:823 2000 [27] Sagnella D E, Laasonen K, Klein M, Biophys J 71:1172 1996 [28] Carloni P, Andreoni W, Parrinello M, Phys Rev Lett 79:761 1997 [29] Florian J, Baumruk V, Strs
M Bedrs163
SJ, J Phys Chem 100 1559
[30] Carloni P, Andreoni W, Hutter J, Curioni A, Giannozzi P, Parrinello M, Chem Phys Lett 234:50 1995 [31] Tolari E, Carloni P~ Andreoni W, Hurter J, Parrinello M, Chem Phys Lett 234:469 [32] Carloni P, Andreoni W, J Phys Chem , 100:17797 [33] Comba P, Hambley T "Molecular Modeling of Inorganic Compounds" VCH, Weinheim, 1995 [34] Rovira C, Kunc K, Hutter J, Ballone P, Parrinello M, J Phys Chem A 101:8914 1997 [35] Rovira C, Ballone P, Parrinello M, Chem Phys Lett 271:247 1997 [36] Rovira C, Kunc K, Hutter J, Ballone P, Parrinello M, Int J Quantum Chem 69:31 1998 [37] Rovira C, Parrinello MInt J Quantum Chem 70:387 1998 [38] Rovira C, Parrinello M, Chem Eur J 5:250 1999
246
[39] Rovira C, Parrinello M Biophys J 78:93 2000 [40] Rovira C, Carloni P, Parrinello M, J Phys Chem B 103:7031 1999 [41] Segall MD, Payne MC, Ellis S W, Tucker GT, Boyes, RN, Xenobiotica 28:15 1998 [42] Segall MD, Payne MC, Ellis S W, Tucker GT, Boyes, RN, Phys Rev E 57:4618 1998 [43] Segall MD, Payne MC, Ellis SW, Tucker GT, Boyes, RN, N Chem Res Toxicol 11:962 1998 [44] Segall MD, Payne MC, Ellis SW, Tucker GT, Eddershaw PJ, Xenobiotica 29:561 1999 [45] Marchi M, Hutter J, Parrinello M, J. Am. Chem. Soc. 118:7847 1996 [46] Bifone A, de Groot HJM, Buda F, Chem Phys Lett 248:165 1996 [47] Buda, F, de Groot HJM, Bifone A, Phys Rev Lett 77:5405 1996 [48] Bifone A, de Groot HJM, Buda FJ, Chem Phys B 1997 101:2954 1997 [49] Bifone A, de Groot HJM, Buda F, Pure Appl Chem 1997 69:2105 1997 [50] La Penna G, Buda F, Bifone A, de Groot HJM, Chem Phys Lett 294:447 1998 [51] Molteni C, Frank I, Parrinello M, J. Am. Chem. Soc. 121:12177 1999 [52] Frank I, Hutter J, Marx D, Parrinello M, J Chem Phys 108:4060 1998 [53] Karplus M, Petsko GA, Nature 347:631-639 1990 [54] Tashian RE, BioEssays 10:186 1989 [55] Xue Y, Liljas A, Jonsson, BH, Lindskog S, Proteins: Str Func Gen 17:93 1993 [56] Silverman DN, TU C Chen X, Tanhauser SM, Kresge AJ, Laipis P J, Biochemistry 32:10757 1993 [57] For computational details and additional information see RSthlisberger U, ACS Syrup Ser, Am Chem Soc, Washington, DC 1998 712:264-274 1995 [58] Cieplak P, Bayly CI, Gould IR, Merz KM Jr, Ferguson DM, Spellmeyer DC, Fox T, Caldwell JW, Kollman PA, J. Am. Chem. Soc. 117:5179 1995 [59] van Gunsteren WF, Billeter SR, Eising AA, Hiineberger PH, Kr/ige P, Mark A, Scott WRP, Tironi IG, GROMOS96, BIOMOS, Z/irich and Groningen 1996 [60] Hakansson K, Carlsson, M, Svensson, LA, Liljas A, J Mol Biol 227:1192 1992 [61] Collyer SMR, McMahon TB, J Phys Chem 87:909 1983 [62] Liljas SG, Bartmess JE, Liebman JF, Holmes JL, Mallard, WG J Phys Chem Ref Data 17, Suppl 1 1988 [63] Merz KM Jr, J. Am. Chem. Soc. 133:406 1991 [64] Fersht A, "Enzyme structure and mechanism" 2nd ed New York: Freeman W H ; 1985 p 327
247 [65] Kraut J, Ann Rev Biochem 46:331 1977 [66] Stroud RM, Sci Am 231:74 1974 [67] Branden C , Tooze J "Introduction to protein structure" 2nd ed New York: Garland; 1999 p 410 [68] Matheson NR, van Halbeek H, Travis J, J Biol Chem 266:13489 1991 [69] Steitz TA, Shulman RG, Annu Rev Biochem Biophys 11:419 1982 [70] Blow DM, Birktoft JJ, Hartley BS, Nature 221:337 1969 [71] Matthews BW, Sigler PB, Henderson R, Blow DM, Nature 214:652 1967 [72] Lin J, Cassidy CS, Frey PA, Biochemistry 37:11940 1998 [73] Cassidy CS, Lin J, Frey PA, Biochemistry 36:4576 1997 [74] Frey PA, Whitt SA, Tobin JB, Science 264:1927 1994 [75] Bryan P, Pantoliano MW, Quill SG, Hsiao HY, Poulos T, Proc Nat'l Acad Sci USA 83:37433745 1986 [76] Hwang JK, Warshel A, Biochemistry 26:2669 1987 [77] Warshel A, Naray-Szabo G, Sussman F, Hwang JK Biochemistry 28:3629 1989 [78] Wells JA, Cunningham BC, Craycar TP, Estell DA, Phil Trans R Soc Lond A 317:415 1986 [79] Lin J, Westler WM, Cleland WW, Markley JL, Frey PA, Proc Nat'l Acad Sci USA 95:14664 1998 [80] Halkides CJ, Wu YQ, Murray CJ, Biochemistry 35:15941 1996 [81] Warshel A, Papazyan A, Kollman PA Science 269:102 1995 [82] Warshel A, J Biol Chem 273:27035 1998 [83] For additional information, see De Santis L, Carloni P, Proteins: Str Func Gen 37:611 1999 [84] Models based on the structure of pancreatic elastase complexed with Ace-Ala-Pro-Valdifluoro-N-phenylethylacetamide: Takahashi L H, Radhakrishnan R, Rosenfield R E, , J. Am. Chem. Soc. 111:3368 1989 [85] Rao SN, Singh UC, Bash PA, Kollman PA, Nature 328:551 1987 [86] Jeffrey GA, Saenger W, "Hydrogen bonding in biological structures" Berlin: SpringerVerlag; 1991 [87] See, e g (a) "3D QSAR in drug design: ligand-protein interaction and molecular similarity", Kubinyi H, Folkers G, Martin YC: Kluwer Escom, Dodrecht-Boston-London, 1998 (b) "Structure-based drug design: computational advances" Marrone JM, Briggs JM, McCammon A, Annu Rev Pharmacol Toxicol 37:71 1997; (c)"Computer-Aided Molecular Design: Theory and Application", Doucet, JP, Weber, J Academic Press, London, 1996
248
[88] Davies DR, Annu Rev Biophys Biophys Chem 19:189 1990 [89] Fitzgerald PMD, Springer JP, Annu Rev Biophys Biophys Chem 20:299 1991 [90] Todd MJ, Semo N, Freire E, J Mol Biol 283:475 1998 [91] Harte WE, Swaminathan S, Mansuri MM, Martin JC, Rosenberg IE, Beveridge DL, Proc Nat'l Acad Sci (USA) 87:8864 1990 [92] Harte WE, Swaminathan S, Beveridge DL, Proteins: Str Func Gen 12:175 1992 [93] York DM, Darden TA, Pedersen LG, Anderson MW, Biochemistry 32:1443 1993 [94] Wlodawer A, Vondrasek J, Annu Rev Biophys Biomol Struct 27:249 1998 [95] Chatfield DC, Brooks BR, J Am Chem Soc 117:5561 1995 [96] Straatsma, TP et al in "Computer Simulations of Biomolecular Systems" van Gunsteren WF, Weiner PK, Wilkinson AJ Eds, (ESCOM, Leiden), p 363, 1993 [97] Liu H, Muller-Plathe F, van Gusteren WF, J Mol Biol 261:454 1996 [98] Geller M, Miller M, Swansom SM, Maizel J, Proteins: Str Func Gen 27:195 1997 [99] Harrison RW, Weber IT, Prot Eng 7:1353 1994 [100] For a description of computational details and additional information see Piana S, Carloni P, Proteins: Str Func Gen in press (2000) [101] Hyland LJ, Tomaszek TA Jr, Roberts GD, Carr SA, Magaard VW, Bryan HL, Fakhoury SA, Moore ML, Minnich MD, Culp JS, DesJarlais RL, Meek TD, Biochemistry 30:8454 1991 [102] Smith R, Brereton IM, Chai RY, Kent SBH, Nature Struct Biol 3:946 1996 [103] McKeever BM, Navia MA, Fitzgerald PM, Springer JP, Leu CT, Heimbach JC, Herbert WK, Sigal IS, Darke PL, J Biol Chem 264:1919 1989 [104] Beveridge AJ, Heywood GC, Biochemistry 32:3325 1993 [105] Polgar L, Szeltner Z, Boros I Biochemistry 33:9351 1994 [106] Trylska J, Antosiewicz J, Geller M, Hodge CN, Klabe RM, Head MS, Gilson MK, [107] Berendsen HJC, Mavri J in Theoretical Treatments of Hydrogen Bonding, Hadzi D Ed, p 119, 1997 [108] Cleland WW, Kreevoy MM Science 264:1887 1994 [109] Miller M, Schneider J, Sathyanarayana BK, Toth MV, Marshall GR, Clawson L, Selk L, Kent SBH, Wlodawer A, Science 246:1149 1989 [110] Erickson J, Neidhart DJ, VanDrie J, Kempf D J, Wang XC, Norbeck DW, Plattner JJ, Rittenhouse JW, Turon M, Wideburg N, Kohlbrenner WE, Simmer R, Helfrich R, Paul DA, Knigge M, Science 249:527 1990
249 [111] Suguna K, Padlan EA, Smith CW, Carlson WD, Davies DR, Proc Nat'l Acad Sci (USA) 84:7009 1987 [112] Silva AM, Cachau RE, Sham HL, Erickson JW, J Mol Biol 255:321 1996 [113] Kempf DJ, Marsh KC, Denissen JF, McDonald E, Vasavanonda S, Flentge CA, Green BE, Fino L, Park CH, Kong XP, Wideburg NE, Saldivar A, Ruiz L, Kati WM, Sham HL, Robins T, Stewart KD, Hsu A, Plattner JJ, Leonard JM, Norbeck DW, Proc Nat'l Acad Sci (USA) 92:2484 1995 [114] Lapatto R, Blundell T, Hemmings A, Overington J, Wilderspin A, Wood S, Merson JR, Whittle P J, Danley DE, Geoghegan K F, Hawrylik SJ, Lee EE, Scheld KG, Hobart PM, Nature 342:299 1989 [115] Wlodawer A, Miller M, Jaskolski M, Sathyanarayana B K, Baldwin E, Weber I T, Selk L M, Clawson L, Schneider J, Kent SB, Science 245,:616 1989 [116] Bagossi P, Cheng YE , Oroszlan S, Tozser J, Prot Eng 9:997 1996 [117] de Clercq E, Ann N Y Acad Sci 724:438 1994 [118] Richman D D Annu Rev Pharmacol Toxicol 33:149 1993 [119] Boyer PL, Ferris AL, Clark P, Whitmer J, Frank P, Tantillo C, Arnold E, Hughes SH, J Mol Biol 243:472 1994 [120] Tantillo C, Ding J, Jacobo-Molina A, Nanni RG, Boyer P L, Hughes SH, Pauwels R, Andries K, Janssen PA, Arnold E, J Mol Biol 24"~:369 1994 [121] Huang H, Chopra R, Verdine GL, Harrison SC, Science 282:1669 1998 [122] Alber F, Carloni P, submitted [123] Gu Z, Gao Q, Fang H, Salomon H, Parniak M, Goldberg E, Cameron J, Wainberg MA, Antimicrob Agents Chemother 38:275 1994 [124] Iversen AK, Sharer RW, Wehrly K, Winters MA, Mullins JI, Chesebro B, Merigan TC, J Virol 70:1086 1996 [125] Elion GB~ Furman PA, Fyfe JA, De Miranda P, Beauchamp Acad Sci 74 5716 1977
C, Schaeffer HJ, Proc Nat'l
[126] Schaeffer HJ, Beauchamp C, De Miranda P, Elion GB, Bauer DI, Collins P, Nature 272:583 1978 [127] Culver KW, Ram Z, Wallbridge S, Ishii H, Oldfield EH, Blaese RM, Science 256:1550 1992 [128] Chen S-H, Shine HD, Goodman JC, Grossman RG, Woo SLC, Proc Nat'l Acad Sci 91:3054 1991 [129] O'Malley BW Jr, Chen SH, Schwartz MR, Woo SLC, Cancer Res 55:1080 1995 [130] Chambers R, Gillespie GY, Soroceanu L, Andreansky S, Chatterjee S, Chou J, Roizman B, Whitely RJ, Proc Nat'l Acad Sci 92:1411 1995
250
[131] Vile RG, Hart IR, Cancer Res 53:3860 1993 [132] Caruso M, Panis Y, Gagandeep S, Houssin D, Salzmann J L, Klatzmann D, Proc Nat'l Acad Sci 90:7024 1993 [133] Alber F, Kuonen O, Scapozza L, Folkers G, Carloni P, Proteins Struc Func Gen 31:453 1998 [134] Wild K, Bohner T, Aubry A, Folkers G, Schulz GE, FEBS Lett 369:289 1995 [135] Pilger B, Perozzo R, Alber F, Wurth C, Folkers G, Scapozza L, J Biol Chem 274:31967 1999 [136] The sulfur atom of Met 128 is 4 8/~ away from the thymine ring. Therefore, it should in principle be possible to find sizable polarization effects on the sulfur [137] see e.g. "Mechanistic Bioinorganic Chemistry" (Thorp H H, Pecoraro V L, Eds, American Chemical Society, Washington D C 1995); "Bioinorganic Catalysis" (Reedijk J, Ed, Marcel Dekker, New York 1993) [138] for a review on Goase see e g : Whittaker JW, in Metals Ions in Biological Systems (Sigel H, Sigel A, Eds, Marcel Dekker, New York 1993), Vol 30, p 315 [139] Johnson JM, Halsall HB, Heineman WR Anal Chem 54:1394 1982 [140] Ito N, Phillips SEV , Stevens C, Ogel ZB, McPherson MJ, Keen JN, Yadav KDS and Knowles PF, Nature 350:87 1991 [141] Ito N, Phillips SEV, Yadav KDS, Knowles PF, J Mol Biol 238:794 1994 [142] Whittaker MM, De Vito VL, Asher SA, Whittaker JW, J Biol Chem 264:7104 1989 [143] Whitaker MM, Whittaker JW, J Biol Chem 265:9610 1990 [144] Gerfen G J, Bellew BI, Griffin RG, Singel D J, Eckberg AC, Whittaker JW, J Phys Chem 100:16739 1996 [145] Branchaud BP, Montague-Smith MP, Kosman DJ, McLaren FR, J Am Chem Soc 1993 115:798 1993 [146] Adams H, Bailey NA, Campell IK, Fenton DE, He QY, J Chem Soc, Dalton Trans 2233 1996 [147] Wang Y, Stack TDP, J. Am. Chem. Soc. 118:13097 1996 [148] Halfen JA, Young V G Yr, Tolman WB Angew Chem Int Ed Engl 35:1687 1996 [149] Whittaker MM, Duncan WR, Whittaker JW Inorg Chem 35:382-386 1996 [150] Halfen JA, Jazdzewski BA, Mahapatra S, Berreau L M, Wilkinson EC, Que L Jr, Tolman WB, J. Am. Chem. Soc. 119:8217 1997 [151] Sokolowski A, Leutbecher H, Weyermiiller T, Schnepf R, Bothe E, Bill E, Hildebrandt P, Wieghardt K, J Biol Inorg Chem 2:244 1997
251
[152] Fontecave M, Pierre JL, Coord Chem Rev 170:125 1998 [153] Vaidyanathan M, Viswanathan R, Palaniandavar M, Balasubramanian T, Prabhaharan P, Muthiah TP, Inorg Chem 37:6418 1998 [154] Ito S, Nishino S, Itoh H, Ohba S, Nishida Y Polyhedron 17:1637 1998 [155] Ruf M, Peripont CG Angew Chem Int Ed 1998 37:1736 1998 [156] Wang Y, Dubois JL, Hedman B, Hodgson KO, Stack TDP Science 278:537 1998 [157] (a) Chaudhuri P, Hess M, F15rke U, Wieghardt K, Angew Chem Intl Ed 37:2217 1998 (b) Chaudhuri P, Hess M, Weyermiiller T, Wieghardt K, ibid, 1 38:1095 1999 [158] Rothlisberger U, Carloni P Intl J Quant Chem 1999 73:209 1999 [159] Rothlisberger U, Carloni P, Doclo K, Parrinello M, J Biol Inorg Chem, in press (2000) [160] Eichinger M, Tavan P, Hutter J, Parrinello M J, Chem Phys 110:10452 1999 [161] Wachter RM, Branchaud BP, Biochim Biophys Acta 138 4:43 1998 [162] Baron AJ, Stevens C, Wilmot C, Seneviratne KD, Blakeley V, Dooley DM, Phillips SE, Knowles PF, Mc Pherson MJ J Biol Chem 269:25095 1994 [163] VandeVondele J, Rothlisberger U (to be published)
This Page Intentionally Left Blank
L.A. Eriksson (Editor) Theoretical Biochemistry- Processes and Properties of Biological Systems Theoretical and Computational Chemistry, Vol. 9 9 2001 Elsevier Science B.V. All rights reserved
253
Chapter 7
Computational enzymology: Protein tyrosine phosphatase reactions K. Kolmodin, V. Luzhkov * and J. ~qvist
Department of Cell and Molecular Biology, Uppsala University, Biomedical Center, Box 596, SE-751 24 Uppsala, Sweden. 1. INTRODUCTION Phosphoryl transfer to and from specific tyrosine residues in proteins is an important regulatory (signaling) mechanism involved in cellular processes such as cell growth, proliferation, differentiation and T-cell activation [1-3]. The cascades of phosphoryl transfer reactions by the phosphorylating protein tyrosine kinases and dephosphorylating protein tyrosine phosphatases (PTPases) form an extremely complex network of interacting proteins in the cell. The mutual actions of the kinases and phosphatases determine the level of phosphorylation of their target proteins and thereby guarantee correct timing of the cellular processes. Most of the proteins that are regulated by each specific type of PTPase are not yet identified. Nevertheless, the PTPases hydrolyze both phosphotyrosyl containing proteins and peptides as well as small arylphosphates in vitro, which makes it possible to characterize them biochemically. The interest in PTPases has exploded in recent years and considerable progress has been made towards elucidating their catalytic machinery. Several enzymological studies as well as crystal structures have been reported [4-11 ], but despite these advances there are a few fundamental questions regarding the catalytic reaction mechanism that remain unanswered. Here we will present a computational study of the reaction catalyzed by the PTPases. The calculations, based on crystal structures of in total three different phosphatases, were performed in order to answer detailed mechanistic questions not always accessible to classical enzymology. A quantum chemical investigation of kinetic isotope effects in phosphate monoester hydrolysis is also presented.
* Permanent address: Institute of Problems of Chemical Physics, Russian Academy of Sciences, Chemogolovka, Moscow Region, 142432, Russian Federation.
254
Substrate
O
~
Asp
/0 H " 9~
~ ~ - - - - - N~ %~. ~ ~ , ~ P-loop / ' "
H..::~2 /H"
N
N
Arg
..
. U4""i"" H2N ~ H~-N ..... i f'~i ...... \ Ser H LTJ'S""......
Cys
Figure 1. Schematic view of the active site in a typical PTPase.
2. PROTEIN TYROSINE PHOSPHATASE REACTIONS 2.1. Protein tyrosine phosphatases The PTPases can divided into three subfamilies based on their primary structure: I. The major family of the PTPases is formed by proteins containing at least one homologous catalytic domain. These proteins can either be membrane bound (e.g. the leukocyte-antigen related PTPase, LAR) or non-membrane bound (e.g. PTP1B). II. The dual specificity phosphatases (DSPases) can dephosphorylate both serine/threonine residues as well as tyrosine residues. This family includes for example the Vaccinia///-related phosphatase (VHR) and the cell cycle controlling phosphatases Cdc25. III. The cytosolic low molecular weight PTPases (LMPTP) form a distinct class, containing a catalytic domain of only 140-180 residues. There is also a large number of specific serine/threonine phosphatases that have a totally different strategy for catalysis. These enzymes utilize bound metal ions for catalyzing the reaction, which is not the case in the PTPases and DSPases. The PTPases all possess the active site signature motif H/V-C-(X)5-R-S/T comprising the characteristic phosphate binding loop, referred to as the P-loop. The backbone amide NH-groups of the P-loop residues are oriented towards the center of the substrate binding crevice forming an phosphate anion hole (Figure 1). As an extension of the P-loop backbone, the guanidinium group of the invariant arginine side chain is involved in binding the substrate and stabilizing the transition states, by forming a bidentate interaction with two of
255
O~,,..
Asp
O ~ - , . ~ Asp
OH O II C y s / ~ S - HO-"P - 0 O-
O~..~. Asp OH Cys ~ S -
O II HO~/H. OH O-
"-
O-
OH
257
Cys " ~ s ~ P " ~ O ' OHO
1L
0.~.--.,~
OH
I
Asp
O-
H
cys.~S-'o__-'o6,
Figure 2. The reaction catalyzed by the protein tyrosine phosphatases.
the non-bridging oxygens of the substrate phosphate group. The hydroxyl group of the serine/threonine residue immediately after the arginine (not in Cdc25) forms a hydrogen bond with the catalytic cysteine. In addition to the P-loop, all PTPases (except possibly Cdc25) also possess a conserved aspartic residue positioned on a more or less flexible loop close to the active site. In most PTPase crystal structures this side chain is at hydrogen bond distance to the bridging oxygen of the ligand. Therefore, this residue is believed to function as a general acid which donates its proton to the leaving group oxygen. 2.2. The PTPase reaction mechanism
PTPases catalyze the hydrolysis of phosphate monoesters yielding inorganic phosphate and the dephosphorylated substrate as products. The fact that active site structures, kinetic properties such as formation of an cysteinyl phosphate intermediate, pH-rate profiles etc. are similar for different types of PTPases [ 12-14] indicates that they all employ a common mechanism for catalysis. The catalytic reaction in PTPases has been shown to proceed via a double displacement mechanism involving a phosphoenzyme intermediate where the phosphate group is covalently bound to the', cysteine residue in the active site motif [ 15]. The formation of this thiophosphate intermediate is accomplished by a substitution reaction where the catalytic cysteine attacks the phosphorus atom and the leaving group oxygen is protonated by the general acid as the P-O bond is cleaved [16,17]. This aspartate residue is thought to subsequently activate a water molecule which hydrolyzes the phosphorylated cysteine in the following step (Figure 2). It is most likely that the catalytic cysteine is in its ionized form when the first nucleophilic displacement takes place. However, it is still unclear
256
whether the catalytic cysteine is in its thiol or ionized form in the free enzyme, as well as in the enzyme-substrate complex. Here we will describe how the total reaction free energy profile of the reaction catalyzed by a low molecular weight PTPase (LMPTP) is calculated using the empirical valence bond method. Combining the results with binding flee energy calculations the protonation state of the reacting fragments is determined. The consistency of the calculated reaction free energy profile is further verified by studies of mutant enzymes. 3. THE EMPIRICAL VALENCE BOND M E T H O D
The empirical valence bond (EVB) method describes chemical reactions in terms of resonance structures or valence bond (VB) states that represent different bonding arrangements and charge distributions along a reaction pathway [ 18-20]. EVB can be used in combination with molecular dynamics simulations (MD) and free energy perturbation (FEP) techniques in order to obtain reaction free energy profiles of reactions in different environments, for example in water solution and the in the active site of an enzyme. Here, MD is mainly used as a tool for thermal sampling of the system, while the FEP technique is used to drive the reaction from reactant to product and allow the free energy profile (potential of mean force) to be calculated. The diagonal elements of the EVB hamiltonian correspond to the diabatic energies of the valence bond states and are given by a regular force field expression Ei -- H i i
V (i) -~- V (i) .at_ V (i> -4- V (i> .at- V (i) .-1- Vs s -~- a (i) = " bond angle torsion nb,rr nb,rs
(1)
where the first four terms describe the bonded and non-bonded energies of the molecular fragments corresponding to the ith resonance structure, while vnb(i),rs denotes its non-bonded interaction with the surrounding system. The sixth term represent bonded and non-bonded interactions within the surrounding system, that are the same for all resonance structures. The parameter a ~') determines the gas phase energy of the ith state with the fragments at infinite separation [ 18-20]. The actual ground-state energy of the system E~ at a given configuration is obtained by mixing the VB states using the off-diagonal elements (resonance integrals)H0 and solving the secular equation: HC = EgC
(2)
257
One advantage with the EVB approach :is that c/and H o can be calibrated using experimental information on reaction free energies (AG ~) and activation barriers (AG::) for relevant reference reactions in solution. The resulting parameters (typically A% and Hu) are then used without change in simulations of the enzyme reaction. The obtained result is then the effect on the free energy profile when the reaction is transferred from one environment (water solution) to another (solvated enzyme). The free energy is evaluated by driving the system between different VB states using an FEP mapping potential of the form: Em = Zo~i/7~7
Z ~ , 7 =1
i
(3)
i
where the mapping vector "~m with components 2m is changed in small incremental steps. For a two-state reaction the mapping potential is typically" ~'m-- ~'~(1--&2)+ ~222
~' e[0,1]
(4)
The actual ground-state free energy is then obtained from the expression:
(5) where
AG(~m)=-RTln~(exp{-@~+1-cr
(6)
4'=0
AG(2m) in Equation 5 denotes the mapping free energy for a particular value of the mapping vector "~m that contributes the sampling of the reaction coordinate value X. The generalized multidimensional reaction coordinate is as usual taken as the energy gap zxG between relevant diabatic VB states [ 18-20]. 3.1. EVB and the PTPase reaction
The valence bond structures used in the: present calculations are shown in Figure 3 and the reaction is thus modeled in terms of conversions between these different states. The first step of the reaction ( ~ 1 ~ 2 ) represents activation of the nucleophile by proton transfer from the cysteine to the dianionic phosphate group of the substrate. The next step is the formation of the transient high energy penta-coordinated structure (~3), followed by release of the leaving group with concerted proton transfer from the general acid residue ((I)3---~i~4). In (I) 4 the phosphate group is covalently bound to the enzyme via a thiophosphate linkage.
258
In the second part of the reaction the leaving group is replaced by a water molecule which hydrolyzes the phosphoenzyme via a second penta-coordinated structure (O5~O6). The nucleophilic addition is modeled to occur concertedly with the proton transfer from the water to the general base residue (the same residue that acted as a general acid in the first step). Inorganic phosphate is released as final product as the S-P bond is broken ((I)6----~(~7) and finally one of the protons is transferred back to the cysteine ( ~ 7 ~ 8 ) yielding the initial state of the enzyme. Since the phosphate oxygens in the enzyme are not equivalent, due to restricted rotation of the phosphate group as can be seen schematically in Figure 1, it is necessary to consider three separate VB structures for each state with a singly protonated phosphate group. For the first reaction step we also examine the most plausible pathway for an unprotonated mechanism (kI-/z--}kI'/3---).kt/4, with total charge-3 on the reacting fragments). In this case there is no proton transfer between the nucleophile and the phosphate group and the negatively charged cysteine reacts directly with the phosphorus atom of the dianionic substrate. The EVB hamiltonians for the different phosphoryl transfer reactions were calibrated against relevant solution reactions utilizing experimental energetics data as well as semi-empirical and a b initio geometry optimizations (se below). As described elsewhere [18-20] the EVB calibration involves determining gasphase energy differences zXao - a (j~- ~(i) as well as off-diagonal matrix elements H,j between pairs of VB states so that the EVB potential surface reproduces experimental reaction free energies and barrier heights of relevant reference reactions in solution. This calibration procedure thus involves simulations of uncatalyzed reaction steps with the reacting fragments in water and fitting the above parameters so that calculated and observed free energies coincide. 3.2. Calibration of the EVB potential Recent a b initio calculations [21,22] combined with the Langevin dipoles (LD) and polarizable continuum model (PCM), on the hydrolysis of mono- and dianions of methylphosphate and various phenyl phosphate derivatives, as well as earlier quantum calculations [23,24] have shown that the reaction paths generally involve two transition states (TSs) separated by a high-energy minimum. Furthermore, it has been found that the associative and dissociative reaction mechanisms seem to have similar energetics in solution [21,25]. These results are also consistent with Guthrie's analysis of available thermodynamic data [26]. A recent thermodynamic analysis of experimental information by us [25] demonstrated that both a late associative and an early dissociative TS can reproduce experimentally observed linear free energy relationships (LFERs) of phosphate ester hydrolysis reactions in solution. These LFERs have also been
259
quantitatively reproduced by a recent ab initio+LD/PCM study of the associative reaction pathway [22]. The issue of associative versus dissociative mechanism turns out to be less important in this case (see below) and here our main objective in calibration of the EVB surface is to estimate the heights of the TSs in water. In the case of proton transfer steps, such as ~ 1 - ~ 2 , and ~ 7 ~ 8 , the pKa difference between donor and acceptor together with available LFERs for proton transfer steps were used as described in [27,28] for calibration of the relevant EVB parameters. Calibration of nucleophilic displacement steps, such as ~ 2 ~ 4 , and ~ 5 ~ 7 utilized data from Kirby and Varvoglis [29] on hydrolysis with phenol leaving groups, from Akerfeldt [30] on hydrolysis of phosphorothioic acids, from Borne and Williams [31] on equilibrium constant dependence on leaving group pK a and from Guthrie's thermodynamic data on phosphoric acid derivatives [26]. For the hydrolysis of phenylphosphate dianion the rate was too slow to be measured in [29], but using the monoanion rate and the ratio between mono- and dianion hydrolysis for the 2-nitro, 4-nitro and 3,5-dinitro derivatives one can estimate an overall barrier of 32.8 kcal/mol for the phenylphosphate dianion reaction in water. This value turns out to be entirely consistent with ~g = - 1 . 2 (the Bronsted coefficient for log k vs. leaving group pK~) [29] and the rate constant estimated by Guthrie [26] for n~tethyl phosphate dianion. The free energy barrier for hydrolysis of ethylthiophosphate dianion is obtained from [31 ] as 26.9 kcal/mol using the same estimate of the ratio between mono- and dianion hydrolysis. Furthermore, the equilibrium constants for hydrolysis of phenyl phosphate and RSPO32- are obtained from [31] a s - 3 . 0 a n d - 3 . 8 kcal/mol, respectively (after correcting for the 55M concentration of water), using pKas of 10.0 for phenol and 8.3 for cysteine. From these pKas and those of water (15.7), PhOPOaH- (5.7) and CH3CH2SPO3H- (5.9) the barriers for reaction of OH-with PhOPOaH- and RSPO3H- can be estimtated as 19.2 and 13.5 kcal/mol, respectively. Also the barriers for PhO- and RS- attack on phosphate monoanion (reverse reactions) are obtained as 32.0 and 29.2 kcal/mol, respectively. The barrier for OH-reaction with methyl phosphate monoanion is estimated to be 31.0 kcal/mol from [26]. The effect of changing the leaving group in the monoanion reaction with OH- f r o m - - O C H 3 t o - O P h thus becomes 19.2-31.0 =-11.8 kcal/mol. This value can be compared to that derived from ~ g = - l . 2 together with the corresponding AAG ~ of proton transfer between water and the phosphate as the leaving group is changed, which i s - l l . 0 kcal/mol. Hence, our estimate for the effect of changing the leaving group from methanolate to phenolate appears entirely consistent with available data and we will use AAG~zg(OCH3~OPh)=-ll.4 kcal/mol (the average of these two
260
values). The same reasoning is employed to estimate the effect of changing the leaving group from -OCH 3 to -SR where we obtain = - 15.6 kcal/mol. These results can now be combined to give two free energy profiles each having two TSs separated by a high-energy minimum for the uncatalyzed reaction in water.
AAG~tg(OCH3~SR)
RSH + PhOPO~-
~""
RSPO~- + PhOH
(7)
RSPO~- +H20
~
RSH+HPO2.-
(8)
The two barriers of the first reaction step are 21.3 and 22.8 kcal/mol, where the former is mainly associated with the incoming nucleophile (RS) and the latter with the leaving group (PhO). The issue of associative vs. dissociative mechanism becomes less important here since it mainly pertains to which barrier comes first and they are of similar height. The level of the high energy transient intermediate structure ~3 is more difficult to estimate accurately since it is not directly accessible to experiment. However, this state mainly serves as a reference for calculation of the flanking barriers wherefore its energy is not at all critical for our simulations. That is, our conclusions here are not affected even if (I) 3 would not be a minimum, which would correspond to a reaction profile with a single 22.8 kcal/mol activation barrier. Our earlier estimate [32] of the free energy of the penta-coordinated transient state (I) 3 at 12.7 kcal above the RS- + PhOPO3 H- state ~2 was based on AM 1-SM2 calculations since they were found to agree with Guthrie's estimate of (I) 3 for the case with OH-groups as axial ligands. In view of the recent MP2/6-31+G**//HF/6-31G* plus LD calculations [22] this appears to be an underestimate and we will instead use 16.5 kcal/mol for this free energy difference here which is in better agreement with [22]. That is, the free energy of (I) 3 is then only 1-3 kcal/mol below the two activation barriers. Geometries of various penta-coordinated species with different combinations of axial groups were optimized with the AM1-SM2 and PM3-SM3 hamiltonians [33]. For the case with RS- and PhO- as axial groups both AM 1-SM2 and PM3-SM3 locate a similar minimum. We used the geometry of ~3 from the former calculations which gives both axial ligand distances of ~2.4 A. One can also note here that both AM1, PM3, HF/6-31G* and HF/6-31+G** optimizations of CH3SPO3Hand CH3SPO3 2- give consistent S-P bond lengths of about 2.1-2.2 and 2.4 A for the mono- and dianion, respectively.
261
/
O
H-N O
~., =~
o
"P--O
S-H
06 ,O / H-N
~2
?_,
o =~
o
s|
." _ "-@o o
O
/ H-N
~2
O
~ )---" s|
P-O
O
/ H -- N /
I~ 3
o=:~s
O
H-N
O
}--
..... ~; ..... o
~3
~|
O
O
O :~
0
I
\ S ..... P ...... 0
o/ ~o
|
/
H "O
/
H--N
@
:
o
O O
\
@o
4
H-N
~4
o
O \
~ - - / 'i \( o~~." O O
o H
Qo
o~ / H-N
~-x
|
,,O
IH
o=C ~io~ ~ \H
@ O
/ H-N O
~6 o=~
I
S ..... P ..... 0
/
H
g'o| ,~
.. o
/ H-N
(~)7
)__,,
o==~
sO
H--O
|
o "
/
P-O
/ H--N
)--,,
~8 o=~
~-~
o o,.P
-
O
|176
Figure 3. Valence bond states used in the EVB calculations of the reaction mechanism catalyzed by the LMPTP. The formal charges of the reacting fragments are indicated.
262
These results also appear to be consistent with the crystal structure of a small double zwitterionic phosphate compound [34] in which the double negative phosphate charge is partly neutralized by interaction with cations. The top curve in Figure 4a summarizes the energetics of the protonated (-2) reaction in water where the effect on the equilibrium constant of protonation from an aspartate has also been included. For the alternative unprotonated @3) reaction, where neither Cys 12 nor the substrate phosphate group is protonated, the solution energetics is estimated directly from the observed values o f ~ g = - l . 2 and fl,~c = 0.13 [29], with Guthrie's value for the OH- + CH3OPO3 2- barrier (42.6 kcal/mol) as starting point. The two TSs are then found to be 31.0 and 34.1 kcal/mol, indicating about 10 kcal/mol higher activation energies than the protonated mechanism. The top curve in Figure 4b shows this estimated reaction profile in water where the effect of protonation from an aspartate is also included. Again, the order of the two barriers may be interchanged or even merged into a single 34.1 kcal/mol barrier (as the reaction approaches true SN2 character where kI'/3 is no longer a minimum). However, as emphasized above, this does not affect the conclusions from our simulations. 3.3. Simulation details The force field parameters for the different VB states were taken as far as possible from the GROMOS87 potential [35], which was also used to model the rest of the system. However, bonds within the reacting fragments were represented by Morse potentials using standard bond lengths and dissociation energies. Charges for the non-standard moieties involving S-P bonding were also derived from AM1-SM2 calculations and merged with those of the standard GROMOS fragments to maintain compatibility with these. Charges and van der Waals parameters for the thiolate species were those developed by Hansson et a l [27]. The protein coordinates used in the MD simulations were those of bovine liver LMPTP in complex with sulfate ion [5] (PDB entry 1PHR). The phenyl phosphate substrate was modeled into the crystal structure using the graphics program InsightlI [36]. The phosphorus atom was positioned approximately where the sulfate atom is found in the crystal structure, letting the phenyl ring perfectly fit in the narrow hydrophobic binding slot. In addition to Cys 12, seven residues close to the reaction center were considered to be charged: Argl8, Arg53, Asp48, Asp56, Arg58, His72, Asp92, whereas Asp129 (the general acid) was protonated. Other charged groups distant from the reaction center were replaced by neutral dipolar groups. All MD/FEP/EVB calculations were carried out using the program Q [37]. The reaction center was surrounded by a 16 sphere of SPC water in the solution (calibration) simulations and by a sphere of
263
the same size containing both protein and water in the enzyme simulations. Nine crystal waters close to the active site were kept at their original positions as the water sphere was generated. Water molecules generated closer than 2.3 A to the protein or crystal waters were removed. Protein atoms outside this sphere were restrained to their crystallographic coordinates and interacted only via bonds, angles and torsions across the boundary during the simulations. A non-bonded cut-off radius of 10 ~ was used together with the local reaction field (LRF) method [38] for longer range electrostatics. The water surface was subjected to radial and polarization surface restraints according to a new model described by Marelius et al [37]. The protein systems were equilibrated by a 20 ps stepwise heating scheme and thereafter 50 ps simulation at a constant temperature of 300 K. The water systems were equilibrated by directly simulating them for 50 ps at 300 K. The MD trajectories were run using a time step of 1 fs and energy data were collected every fifth step. The free energy perturbations were sampled using 47-83 )~-points and 5 ps simulation for each value of )~. Data from the first 2 ps of each step were discarded for equilibration.
4. REACTION FREE ENERGY PROFILE OF THE LMPTP
4.1. Step 1: Substrate dephosphorylation The free energy profile of the first part of the reaction, where the phosphate group is transferred to the catalytic cysteine of the enzyme (Cysl2 in LMPTP), was calculated for both the protonated and unprotonated reaction mechanism. The resulting free energy profiles from the EVB/FEP/MD calculations are summarized in Figure4. The upper curves are those of the simulated uncatalyzed reference reactions in solution, after calibration against experimental data. The lower curves are obtained by simulating the corresponding reactions in the solvated enzyme. It can be seen that the enzyme exerts a substantial catalytic effect on both the monoanionic and the dianionic reaction and generally stabilizes the high-energy structures by about 10-20kcal/mol compared to the uncatalyzed reactions. Surprisingly, the calculated activation barriers are 13.5-14 kcal/mol for both reactions, a value which is in excellent agreement with the reported rate of this step with p-nitrophenyl phosphate as substrate at pH 5,540 s-~ [39] and 789.5 s-1 [17].
264
25
A G (kcal/mol)
IlL
20
....
15
.~.
f
\\
I." ' \ Io' ,'~ \ t...._.~,, i:/ il" '.~, t,'. ,'. ~ ,~',~.~ t'' '1, 1,,
H on 02
. . . . . H on 03 ....... H on 04
.~'.
10
water 25
....... LMPTP
15 .'~"'
5
"~"' ' ""
l:': m /
0
-5
~.~ (I) 1
(I)2
(I)3
~4
Reaction coordinate
-5 t kI/2
4=
-15
1
~3
'--~4 Reaction coordinate
Figure 4. a) Free energy profiles of the protonated mechanism with a total charge of-2 on the reacting fragments (~i-+~4). The upper curve is the uncatalyzed reference reaction in water solution. The three lower curves are the same reaction simulated in the protein with the proton positioned on the three different oxygens as indicated in Figure 1. b) Reaction free energy profiles of the unprotonated mechanism with a total charge of-3 on the reacting fragments (~'2~P~). We find that for the protonated reaction the protein environment facilitates proton transfer from Cysl2 to the phosphate group of the substrate ( ~ ~ 2 ) , thus ensuring availability of the nucleophilic anion for the substitution reaction. This proton transfer would then correspond to a substrate assisted reaction mechanism if the cysteine is in its thiol form in the free enzyme. The small difference in free energy between I ~ 1 and (~2 ( 1 " 5 kcal/mol) indicates that the pK a of the cysteine is close to that of the substrate, i.e. it is lowered by the enzymatic environment. It has previously been shown that, among other interactions, a hydrogen bond from Serl9 is important for lowering the pKa of Cysl2 [14,27] The simulated reaction profiles of the three possible proton transfers show that there is no significant discrimination of the acceptor and that proton transfer is feasible from the cysteine to any of the three oxygens. The position of the proton does not have any major effect on the catalysis of the approach to the transition state (~2--,~3). On the other hand, it appears that stabilization of the thiophosphate intermediate resulting from leaving group departure is more sensitive to the nature of phosphate protonation. When the proton is bound to the 03 oxygen, which accepts a hydrogen bond from N~ of Arg 18, it can be engaged in hydrogen bonding to the negatively charged Asp129. When the proton is bound to 02 the distance to Asp 129 becomes too large to allow such hydrogen bonding. For the third case with the proton bound to 0 4 we observe a
265
Figure 5. Snapshot of the active site in the high-energy region of the reaction ((I)3----~(I)4). The side chains of residues 13-17 are omitted for clarity. Stabilizing hydrogen bonds are indicated as broken lines and the partial axial bonds are dotted. Note the phosphate hydrogen positioned on oxygen 03 stabilized by Asp129 and the carboxylic proton which is being transferred to the leaving group as the P-O bond is cleaved. stabilization of the phosphoenzyme intermediate that is somewhere in between the other two cases. Simulations of the P-O bond cleavage and leaving group departure clearly indicate that bond cleavage at the bridging oxygen has to be concerted with protonation of the leaving group in order to depress a charge separation in the active site. This is in agreement with interpretations of solvent isotope effects and proton inventory experiments which suggest that the proton from the general acid is largely transferred to the bridging oxygen in the transition state [40]. The bond cleavage was first simulated along a stepwise pathway with consecutive bond break and proton transfer, via a phenolate species. This pathway was predicted to be energetically unfavorable in the enzyme, yielding a barrier of ~22 and ~35 kcal/mol for the protonated and unprotonated reactions respectively. A developing negative charge on the leaving group oxygen apparently cannot be stabilized by the relatively hydrophobic surrounding in this region and, since the binding cavity is very narrow, solvating water molecules are excluded from the active site. The concerted pathway, (I03---~(I) 4 is strongly facilitated by the enzyme and the resulting negative charge on Asp 129 is, unlike the phenolate ion, accessible to solvent. An MD snapshot of the transition state region corresponding to P-O bond cleavage is shown in Figure 5.
266
PhOPO3H-water AGbind(monoanion)= AG, I
AG3 ~. PhOPO32-water I AG2 =AGbind(dianion)
PhOPO3Hprotein AG4~ PhOPO32promn aaG,.~ = aG - aG = aG - aG
(9)
Figure 6. Thermodynamic cycle for determination of the relative binding free energy between
a monoanionic and a dianionic substrate.
The calculated free energy profiles show that the activation energy of the unprotonated substitution reaction with concerted protonation by Asp 129 is similar to that of the protonated reaction. However, the unprotonated reaction gives an exothermicity of 13 kcal/mol. The difference in free energy between ~4 and ~4 is given by 1.36"(pH-pKa), where the relevant pKa is that of the thiophosphate group in the enzyme. This pK~ value is close to the pH normally used in experiments and thus only a small free energy difference between ~4 and ~4 is expected. In Figure 4 the observed difference is 16 kcal/mol, which indicates that the free energy profiles of the two simulated reactions are shifted relative to each other and that the exothermicity results from destabilization of the reactants OF2) rather than a large stabilization of the phosphoenzyme intermediate (q"4). 4.2. Binding free energy calculations Substrate binding is a prerequisite for catalysis and an important step in the reaction which should be considered for a complete understanding of the energetics. In this case, the most straightforward way to examine this issue is to try to evaluate the difference in substrate affinity for the two different protonation states. We thus performed free energy perturbation (FEP) calculations where the substrate phenyl phosphate was transformed from monoanion (proton positioned on 03) to dianion in aqueous solution and in the solvated protein with Cys 12 in its anionic form according to the thermodynamic cycle shown in Figure 6.
267
AG (kcal/mol)
Ts (-3)
9 161
Q'~
E+S
!si2i5 14
(1)1./.. .... ..1~.2 .........................................
ES
ES~
I(I) 4
E-P + l.g.
Figure 7. Calculated thermodynamic cycle describing the relationship between the two possible mechanisms of catalysis in LMPTP.
The calculated difference in binding free energy was 15.9+0.9 kcal/mol for the monoanion to dianion perturbation, indicating that there is much less affinity for a dianionic substrate than a monoanionic substrate with Cysl2 ionized. Affinity calculations using the linear interaction energy approach [41,42] also confirmed a large difference in binding free energy. It was also seen from the MD structures that the distance between the nucleophile and the phosphorus atom was significantly increased (from 3.6 to 4.6 A) due to electrostatic repulsion, as the perturbation proceeded from monoanion to dianion. The average MD structures of the protonated and unprotonated states were superimposed on the crystal structure. The r.m.s, deviation for the heavy atoms of residues Cysl2-Serl9, Asp129 and the phosphate group was 0.43 A and 0.97 A for the monoanionic and dianionic states, respectively, and it was clear that the overall structure of the P-loop was significantly distorted in the case where the proton was absent. In particular, the nucleophilic sulfur had moved 1.7 A. from its original coordinates away from the substrate. On the other hand, with Cysl2 ionized and a proton on oxygen 03 the average MD structures were in excellent agreement with the crystal structure [43]. The MD structures suggest that a dianionic substrate, although having favorable interactions with the P-loop amide nitrogens and the positively charged Argl8, is in an electrostatically disfavored position. It has been proposed that the positive arginine would
268
effectively neutralize one of the charges on the phosphate group [44]. However, Arg 18 also forms an ion pair with Asp92 which makes the positive charge in the active site less pronounced. With a monoanionic substrate, the hydrogen bond between the nucleophile and the phosphate hydroxyl group keeps the reactants in close contact. The destabilization of the unprotonated ES complex obtained from the binding calculations agrees well with the exothermicity observed in the reaction simulations with no proton present on the reacting groups. This allows us to rather accurately close the thermodynamic cycle describing the states involved in the two possible reaction pathways. The difference in binding free energy shifts the unprotonated ES state (W2) by +16 kcal/mol relative the corresponding protonated state (~2) and as a result, the levels of W4 and (I) 4 closely coincide as expected. The thermodynamic cycle which summarizes the energetics of the substrate dephosphorylation step in LMPTP is shown in Figure 7. Thus, the most probable protonation state of the reacting groups of the PTPase-substrate complex was determined using EVB and FEP techniques.
4.3. Step2: Phosphoenzyme hydrolysis The second step of the reaction, phosphoenzyme hydrolysis, was simulated analogously to the first reaction step, but here only the protonated mechanism was considered. As can be seen in Figure 8, where the complete reaction is summarized, all steps of the reaction are significantly catalyzed by the enzyme compared to the uncatalyzed reference reaction in water. In particular, the activation barrier of the rate limiting step, formation of the second pentacoordinated high-energy structure ((I)5---).(I)6) , is lowered by as much as 15 kcal/mol. The calculated rate limiting barrier is 16 kcal/mol which is in excellent agreement with the reported kca t value of 27.5 s-~ for phenyl phosphate [39]. Since concerted bond breaking and leaving group protonation was found to be considerably favored over a stepwise mechanism in the first part of the reaction, the analogous concerted pathway was also modeled here. Simulation of the first step showed that the protein environment cannot stabilize a negative ligand in the active site outside the phosphate binding loop, which would also be the case for a stepwise proton transfer to Asp129 and a subsequent in-line attack of a hydroxide ion. The complete reaction in which the phosphate group is effectively transferred from phenol to a water molecule is exothermic with a change in standard free energy o f - 3 kcal/mol [30]. In Figure 8 the equilibrium constant on the enzyme becomes-5 kcal/mol in favor of the products. Here the apparent shift of the equilibrium constant includes the difference in binding of the product versus the
269
A G (kcal/mol)
water
,;,
(~1 .
(I)2
(I)3
~4/5
~6
~ "-"
q~6 Reaction ' \ ~ coordinate
Figure 8. Complete free energy profile of the reaction mechanism catalyzed by LMPTP. The reaction coordinate refers to the valence bond states shown in Figure 3.
substrate. Since inorganic phosphate is a competitive inhibitor of LMPTP [ 11 ] it is not unexpected that the product binds somewhat stronger than the substrate and thus lowers the level of(I) 7 and ~8 relative to ~ and ~2- The reaction could also involve a free energy change when the leaving phenol group is exchanged for an incoming water molecule (~4-~5). The difference in binding free energy between a phenol molecule and a water molecule was therefore estimated using the linear interaction energy (LIE) method [41,42]. The absolute binding affinities of phenol and water to the active site were calculated as in [42], giving the result that the binding free energies were very similar (-2 kcal/mol) for the two ligands. The energetics of ligand exchange does therefore not affect the free energy profiles in Figure 8. MD trajectories of the wild-type phosphoenzyme intermediate show that two water molecules interact directly with the phosphate group (Figure 9). One of these water molecules will be in the right position for the hydrolysis reaction to occur. In PTP1B Gln262 has been found to be an important residue for coordinating the nucleophilic water molecule. Mutating this residue to alanine resulted in phosphoenzyme trapping which made it possible to crystallize the reaction intermediate [45]. Although very similar in active site structure, there is no corresponding glutamine present in LMPTP. However, our simulations of the water attack ( ~ 5 ~ 6 ) reveal that Cysl 7 interacts with the nucleophilic water. It seems that this interaction is involved in coordinating the water molecule in favor of the reaction. The involvement of Cysl7 in the phosphoenzyme hydrolysis step was proposed by Cirri et al [46] already in 1993, before the
270
Figure 9. MD structure of the active site in the phosphoenzyme intermediate state (~5) viewed along the P-S bond. The two water molecules (Wl and W2) interacting with the phosphate group are shown, whereas the side chains of residues 13-16 are omitted for clarity. Wl is in the position for an attack on the phosphorus atom. The network of hydrogen bonds is shown as broken lines.
structure was solved. When Cys 17 was mutated to a serine the enzyme displayed low activity, but significant amounts of phosphoenzyme intermediate was trapped. This suggests that the larger thiol group better orients the water molecule than the smaller hydroxyl group in position 17. We also calculated the free energy profile (not shown) for the water attack ( q ~ 5 ~ 6 ) in the C17S mutant enzyme and it was found that the free energy barrier increased with 1.6 kcal/mol. This is totally consistent with the 6% residual activity compared to the wild type enzyme presented by Davis et al [47]. The polar and steric interaction between Cys 17 and the water molecule that are involved in its appropriate positioning can be directly appreciated in Figure 9. Superimposing the active site residues of PTP1B and LMPTP reveals that the proposed water coordinating residues (Gln262 in PTP1B and Cysl 7 in LMPTP) are in the same spatial position relative to the active site, although not sequentially related. Cysl 7 is a residue in the phosphate binding loop, whereas Gln262 is positioned in a flexible loop that can apparently move in and out of the active site [45].
271
4.4. Reaction mechanism for mutants lacking the general acid residue The D 129A mutant of LMPTP has been extensively studied in enzymological experiments. This mutant lacks the catalytically important general acid/base residue. However, the mutant is not entirely inactive, but retains an activity around 3000 times less than that of the wild-type enzyme[17]. We have found that protonation of the leaving group is essential for catalysis of phenyl phosphate hydrolysis since release of negatively charged phenolate species is energetically disfavored. If the leaving group departs as an anion we predicted an energy barrier that is not compatible with the experimentally observed activity. We therefore propose that the phosphate group itself may act as an acid in the first reaction step of this mutant and protonate the leaving group concertedly with its release. The alternative reaction mechanism for D 129A was simulated in the same way as the wild-type reaction, but with a slightly different set of valence bond states [48]. This hypothesis yields an activation barrier of the first step that is 5 kcal/mol higher than the corresponding wild-type reaction step. This corresponds to a decrease in rate by a factor of 4000 that is consistent with experiments. It then seems reasonable that the -2 charged phosphocysteine could itself abstract a proton from the attacking water molecule in the second hydrolysis step of the D129A mutant enzyme. This mechanism would then be similar to the substrate assisted reaction mechanism proposed for the acylphosphatase [49]. The complete free energy profile for this reaction mechanism in the D 129A mutant is shown in Figure 10. The free energy level of the phosphoenzyme intermediate lies somewhat below the initial enzymesubstrate level. From this lowest point of the profile the rate limiting barrier 9 5 ~ 6 is predicted to be 20 kcal/mol which is in accordance with the observed turn-over rate of 0.012 s-~ [17]. For the wild-type enzyme, the phosphoenzyme intermediate is higher in energy than the initial enzyme-substrate complex, while in the mutant it is slightly lower. This would imply that more phosphoenzyme intermediate should accumulate in the D129A mutant than in the wild-type, which has also been observed by phosphoenzyme trapping experiments[17]. The good compatibility of this proposed reaction mechanism with available experimental data suggest that this pathway may actually be utilized by mutant LMPTP lacking the general acid/base Asp 129. 4.5. The pK a of the catalytic cysteine is different in LMPTP and PTP1B Since a negatively charged thiolate group is a better nucleophile than the protonated thiol, the catalytic cysteine is believed to be deprotonated prior to nucleophilic substitution. Depending on its PKa, the r could be deprotonated already in the free enzyme, or it could be ionized after proton transfer to the substrate phosphate group in the Michaelis complex. The pKa of
272
AG (kcal/mol) m
..,,. .~. 9
,-,,,~
, o
.,,"~ ,
.
j!"1
/ (~1
~.a'
I
I
'~:
.'
i
~.
i'. [:
.....
LMPTP
.......
D129A mutant
i ,, |
,.! iL.s
,'-',, i ,t \ ~
l~2
.rm.., \
"~_./'
,,,~.
!.r-,
,,
(~3
(~4/5
i i (~6
',~7 ----- C~8 Reaction i 9 ... ~ ! , \-!,,.. coordinate
Figure 10. Calculated reaction profiles for wild type and D 1 2 9 A mutant L M P T P .
the cysteine is dependent on the surrounding environment and it would therefore be interesting to study how the energetics of nucleophile activation differs in among the PTPases. The active site of all PTPases are very similar in sequence and structure. However, the LMPTPs differ from the other tyrosine specific PTPases (e.g. PTP1B) in that there is no histidine residue prior to the catalytic cysteine in the sequence. The composition of amino acids surrounding the substrate binding site is also different in the two enzymes. We have employed the EVB method to study the energetics of nucleophile activation by proton transfer to a dianionic substrate in both LMPTP and human PTP1B. The two valence bond states ~l and ( I ) 2 used in the calculations are shown in Figure 3. These states represent the reactants and products for the reaction where a proton is transferred from the cysteine residue to the phenyl phosphate dianion. Starting coordinates for the protein simulations were the structure of bovine liver LMPTP in complex with sulfate and human PTP1B (C215S mutant) in complex with phosphotyrosine (PDB entries 1PHR [5] and 1PTV [9] respectively). The EVB potential was calibrated to reproduce experimental data for the uncatalyzed reference reaction in solution. In the case of proton transfer the difference in free energy between the two states, A G ~ can be obtained from the difference in pKa between the donor and acceptor. Once A G ~ is known the activation energy A G ~ can be determined from a linear free energy relationship compiled by Eigen [50]. For the proton transfer described by ~--->~2 the
273
/
8
i
_ ~ _ water
' 5 4
'/2
:.
-150
-100
-50
(t)]
/~
(t)2
100
150
reaction coordinate
Figure 11. Calculated energetics of proton transfer between the catalytic cysteine and the dianionic phosphate group of the substrate in LMPTP and PTP lB.
resulting difference in free energy is 3.5 kcal/mol and the activation energy is 7.8 kcal/mol. This estimate of the barrier effectively includes zero-point energy and tunneling effects since it is obtained from experimental data. In typical EVB studies of enzymatic reactions it is usually assumed that these quantummechanical effects do not differ significantly between the water and enzyme environments. This assumption has been verified by implementation of the path integral method [51] within the EVB framework [52,53]. The reaction free energy profile obtained from the water simulation containing only the solvated reacting fragments was calculated using the sampling approach described above. The EVB parameters Ac~ and H~e were adjusted until the calculated profile reproduced the experimental values. The resulting values were then used when evaluating the corresponding simulations of the reaction in the two enzymes. The simulated free energy profiles for proton transfer are shown inFigure 11. The upper curve is the calibrated profile of the reference reaction in solution. The other two curves show that both enzymes have a significant catalytic effect on the proton transfer, i.e. the activation energies and the free energy differences between the two states are lowered compared to the water simulation. Comparing the two enzymes it appears that ~2 is energetically more stable in PTP1B than in LMPTP, which indicates that the catalytic cysteine has a pKa
274
approximately two units lower in the former enzyme. This is consistent with experimental data as well as computed pK~s [54]. 4.6. Summary Arylphosphate hydrolysis is effectively catalyzed by the PTPases without the use of active site bound cations utilized by many other proteins that handle phosphorylated substrates in order to stabilize the negative charges of the reacting groups. The catalytic power of the PTPases instead arises from the perfectly designed active site structure which stabilizes each step of the reaction. The major properties that contribute to catalysis can be summarized as follows: I. The essential nucleophilic thiolate species is stabilized by the interaction with a hydroxyl group and a number of backbone amides hydrogen bonds. This stabilization lowers the PKa of the cysteine favoring its activation ((I)l-~(I)2). II. The P-loop backbone amides and the side chain of the arginine residue supplies perfect stabilization of the equatorial oxygens of the penta-coordinated transition states by a network of hydrogen bonds. III. An increased pK a of the general acid/base residue results in a larger catalytic effect of the second, rate limiting step where the water molecule is activated by the general base ((I)5--~6), compared to the first step where the same residue acts as an acid
((I) 3---~(I)4). The effects of these three features are clearly demonstrated by the above calculations where the obtained free energies of each step is in good agreement with experimental observations. The fact that also energetics of mutant LMPTP (D129A and C17S) are consistent with experiments indicates that the present computational modeling approach can successfully describe the catalytic process in a PTPase. Importantly, the calculations show that the P-loop is designed to stabilize exactly two negative charges, which means that the reacting fragments (nucleophile and phosphate group) must be singly protonated. Establishing the protonation state of the groups involved is essential for fully understanding the energetics of the catalytic mechanism and thus, the results presented here could serve as a framework in which enzymological experiments may be interpreted. 5. SUBSTRATE TRAPPING IN CYSTEINE TO SERINE MUTATED PTPases
The cysteine residue in the catalytic loop (Cysl2 in LMPTP, Cys215 in PTP1B) is the essential nucleophile for PTPase activity. Experiments show that cysteine to serine mutants are completely inactive but can still bind substrate molecules. The ability to bind substrates without hydrolyzing them is called substrate trapping and has been exploited when searching for native PTPase
275
(10) Figure 12. Thermodynamic cycle for determination of the difference in binding free energies between wild type and Cys---~Sermutated PTPases.
substrates in cell extracts. It is expected that the Cys--~Ser mutant have lower activity than the wild type since the hydroxyl group of the serine residue is a worse nucleophile than the thiol group of the cysteine. However, it is not totally clear why this mutant is completely inactive. The substrate binding properties of wild type and Cys---~Ser mutated PTPases are easily investigated by relative binding free energy calculations. Using free energy perturbation (FEP) according to the thermodynamic cycle shown in Figure 12 the difference in affinity between the wild type and mutant proteins can be obtained. Two sets of simulations are necessary for each protein; one with the empty solvated structure and one with phenyl phosphate dianion bound in the active site. The catalytic cysteine is then slowly transformed to serine, and vice versa, in both enzymes. The perturbations include change of charges, van der Waal parameters, bond lengths and one bond angle for the three atoms: C~-S/O-H. The simulations were performed using the PTPase crystal structures as above. The PTP1B structure was a serine mutant with a phosphotyrosine ligand, which was manually replaced by phenyl phosphate. Acidic and basic residues close to the active site were charged whereas those outside the simulation sphere and distant to the active site were replaced by polar neutral groups giving the system a total charge of zero [37]. Each simulation was prepared by slow heating of the system from 1 to 300 K followed by 100 ps equilibration at 300 K. The perturbations were performed at
276
Table 1. Results obtained from the FEP simulations. Simulated system
free PTP 1B PTP 1B+ligand free LMPTP LMPTP+ligand
Cys--~Ser
AG(kcal/mol)' -7.6+0.1 - 15.2+0.1
-9.5+0.1 -17.2+0.1
Ser ---~Cys AG (kcal/mol)' 6.5+0.1 13.5+0.1 9.0+0.1 17.3+0.1
AGave
AAGbina
(kcal/mol)
(kcal/mol) 3
(Cys-~Ser) z -7.1+0.4 -14.3+0.5 -9.2+0.2 - 17.2+0.1
-7.3+0.8 -8.0+0.3
this temperature using 51 )~-steps, 1 fs time steps and 5 ps sampling at each Z. Energies were collected every fifth Is. The energy data from the first 2 ps of each Z-step were discarded for equilibration. Forward and backward FEP simulations were performed, starting from mutant PTP1B and wild type LMPTP. The backward perturbations, preceded by a 25-50 ps equilibration, were started from the endpoints of the forward runs. The change in free energy for each perturbation (AG1 and AG2) were calculated using Equation 6. The results shown in Table 1 show that the simulations forwards and backwards yield similar energies with small standard deviations. The negative ZIAGbincl values predict that the serine mutants of both LMPTP and PTP1B bind phenyl phosphate 7-8 kcal/mol more strongly than the native enzymes. The stronger stabilization of the enzyme-substrate complex relative to the transition state in addition to the higher pK a of the hydroxyl group compared to the thiol group are therefore likely to be the major reasons for the complete lack of activity. 6. P R E D I C T I O N O F A L I G A N D I N D U C E D C O N F O R M A T I O N A L C H A N G E IN T H E A C T I V E S I T E O F C D C 2 5 A The cell cycle control phosphatases Cdc25 are dual specificity phosphatases (DSPases) that dephosphorylate both phosphothreonine and phosphotyrosine
' The free energy difference obtained from the simulation. Ser--~Cys indicates the forward mutation and Cys---~Serrefers to the backward mutation. The error is the convergence error obtained from summation in the two directions on the same trajectory. 2 The average free energy difference calculated from the two independent simulations of columns 2 and 3, with the standard error of the mean. The sign of this value corresponds to the Cys-+Ser mutation. 3 The difference in binding energy between the wt-ligand complex and the mutant-ligand complex. A negative value indicates that the serine mutant binds the ligand stronger than the wild type. The error is the sum of the errors of the terms.
277
Figure 13. a) Superposition of the backbone atoms of residues 430-436 in Cdc25A (white, PDB entry 1C25) and 12-18 in LMPTP (gray, PDB entry 1PHR). The sulfate ion is found in the crystal structure 1PHR. In addition to Ser434 and Glu435 only the totally conserved sidechains Cys and Arg are shown for clarity. b) Average MD structure of the ligand complex with Cdc25A after the observed conformational change, superimposed on LMPTP as in Figure 13a. residueS of their substrate proteins. The determination of the apo-protein structure of Cdc25A revealed that this enzyme has a completely different fold compared to all other phosphatases crystallized to date [55]. Although different in fold, the crystal structure confirms the expected features of the characteristic active site containing the C-Xs-R motif. Crystal structures of PTPases and DSPases in complex with various ligands show a common structure of the active site with the backbone amides of the loop residues pointing into the center of the crevice in order to stabilize the equatorial oxygens of the phosphomimetic group. The conformation of the corresponding region in the unliganded structure of Cdc25A is different. Here one of the peptide bonds is pointing its amide group in the opposite direction. This conformation would be very disfavored in the enzyme-substrate complex with the negatively charged phosphate group positioned in the active site. It would also have a destabilizing effect on the transition state which requires maximal stabilization of the equatorial oxygens for efficient catalysis. By performing MD simulations of the Cdc25A apo-structure (PDB entry 1C25 [55]) and a modeled Cdc25A-ligand complex some structural features of this protein were studied. The MD trajectories of the apo-structure in water were run for 65 ps at 300K and showed relatively stable energies and well
278
Figure 14. Ramachandran trajectory plot of Ser434 and Glu435 during the first 5 ps of the MD simulation at 300 K. The arrows indicate the direction of the trajectories. Shaded areas correspond to the generally allowed regions for q~and qt. retained structures. Simulation of the modeled complex of Cdc25A and 8042yielded stable trajectories after 2.5 ps simulation at 300K. However, already at 2 ps a conformational change in the backbone peptide bond between residue Ser434 and Glu435 occurred. In the starting (crystal) structure (Figure 13a) the dipoles of this peptide bond are 'inverted' compared to the pattern seen in the other H-C-Xs-R containing structures. 'Inverted' here refers to the fact that the carbonyl oxygen is pointing into the phosphate binding site and the amide nitrogen is pointing outwards. As expected, this conformation is electrostatically unfavorable when there is a negatively charged ligand bound in the phosphate binding loop and thus the dipoles spontaneously flip over to the preferred conformation when the sulfate ion is present (Figure 13b). The conformational change was clearly monitored by measuring the Ramachandran angles of residues Ser434 and Glu435 during the MD trajectory (Figure 14). The diagram shows that residue 435 has a strained conformation in the starting structure with the Ramachandran angles q~ and ~being unstable and located outside the allowed regions. As the trajectory proceeds the torsional angle q~ changes and the Ramachandran trajectory ends up in the allowed region of the diagram. For Ser434 mainly the ~ angle changes its value and the Ramachandran trajectory plot is displaced from the allowed region defined by 13 strands into the region typical for a-helical structures. This ligand induced conformational change yields a structure that is similar to the other PTPase-ligand complexes that have been crystallized. The results
279
obtained from the MD simulations emphasize the importance of the active site conformation of the PTPases and DSPases, with respect to substrate binding and catalysis. We suggest that the type of conformational change that was observed upon ligand binding in Cdc25A is an important molecular switch in the catalytic process [56]. 7. KINETIC ISOTOPE REACTIONS
EFFECTS
IN
PHOSPHORYL
TRANSFER
In the case of phosphate monoester hydrolysis by PTPases it has been argued that reported ~sO isotope effects for the non-bridge phosphate oxygens, 18(Vmax/Km)non_bridge, show that the reacting groups are unprotonated [44]. However, the 18(V/K)non_bridgevalues were then corrected, by the 180 isotope effect of deprotonation, for the fraction of monoanion (that needs to be deprotonated if only the dianion is reactive) present under the experimental conditions. Since such a correction assumes that the reactive species is the phosphate dianion (together with thiolate) the result cannot be used to prove the assumption. One could equally as well assume that the reactive species is the phosphate monoanion plus thiolate or the dianion plus thiol (with proton transfer according t o (I)l--->(I)2) , in which case the correction would go in the opposite direction. This would lead to a corrected value of 18(V/K)non_bridge--l.O17 which is, in fact very similar to that observed for hydrolysis of p-nitrophenyl phosphate in solution [44]. The ~sO isotope effects from the three non-bridging phosphate oxygens are often used as diagnostic tools for investigating the details of phosphoryl transfer reactions in enzymes and solutions. It would be interesting to investigate whether the experimentally observed kinetic isotope effects (KIEs) for phosphoryl transfer in solution can be reproduced by ab initio calculations. If so, the calculations might tell us something about the probability of possible pathways. 7.1. Calculations of heavy atom kinetic isotope effect in phosphate monoester hydrolysis Hydrolysis of phosphate esters is one of the fundamental biochemical reactions and a vast amount of research has been devoted to the study of phosphoryl transfer reactions [57-60], both in solution and in enzymes. Despite these efforts there are still ambiguities regarding the interpretation of experimental data (e.g., linear free energy relationships, kinetic isotope effects, crystal structures of enzyme-inhibitor complexes etc.) in terms of detailed reaction mechanisms [21,25,59,60]. Of particular interest has been to determine
280
whether these reactions follow associative or dissociative pathways (Figure 15). Here we report an attempt to address the issue of heavy atom kinetic isotope effects (KIE) in phosphate ester monoanion hydrolysis by quantum mechanical calculations. The use of 180 isotope effects from the three non-bridging phosphate oxygens as a diagnostic for investigating phosphoryl transfer mechanisms was pioneered by Cleland and coworkers [61,62]. Intuitively, the non-bridge~80 KIE would be expected to be normal for an associative mechanism while inverse for a dissociative one, judging from the formal equatorial bond-order to P in pentacovalent and metaphosphate like (transition) structures (Figure 16). For hydrolysis of the monoanion the situation is, however, complicated by a significant ~80 equilibrium isotope effect (EIE) on deprotonation [61 ]. Thus, the interpretations of experimentally measured KIEs depend on assumptions regarding proton transfers during the reaction [62]. To investigate this problem we use ab initio methods to calculate the effect of isotopic substitutions on the gas-phase free energies of the alternative reaction pathways. The 180 kinetic isotope effect on methylphosphate monoanion hydrolysis is calculated using the general formula '6k/18k = exp((AG~8 - AG~6 ) / RT)
(11)
where AG,*6 and AG,~ are the reaction activation free energies for 160 and ~80 isotopes in the non-bridging positions, respectively. The values of AG~are obtained from ab initio calculations as described below. As a check of the computational procedures, we also calculate the equilibrium ~80 isotope effect for deprotonation of methylphosphate and orthophosphoric acid monoanion, [61 ], from the ratio
16Keqll8Keq
=
exp((AG18 - AG16 ) / R T ) ,
(12)
where AG,6 and AG18 are the corresponding free energy differences between the mono- and dianions. All calculations of structures and energies for the reaction species are performed using the Gaussian-94 program [63]. Structures of the stationary points are fully optimized in redundant coordinates and characterized by subsequent frequency calculations. Geometry optimizations and thermodynamic calculations are performed using the 6-31G*, 6-31+G*, and 6-3 I++G** basis sets [63,64] containing polarization and diffuse functions (for the two latter). The calculated vibrational frequencies are scaled by a conventional factor of 0.8929 [64]. Electron correlation is included at the second
281
H-- .O OCH 3 / % / O P --"~ kit- - O / \ O H
H I
O II
O ....... P--OCH3 ft,,7.O/ \OH
1 3
O
O~
II /\
I
HO--P--OCH3 HO/ \ OH
4TS
HO--P .........OCH3 HO
5
Associative hydrolysis pathway
/~
MePO3H- + H20 1
PO4H2- + CH3OH
2
l!
"
O-~- P ........ 0 C H 3 + 2 ~
H/O'xH PO 3 - + 2 + 8
II O 9TS
7
Dissociative hydrolysis pathway
O ,-7..... H I
O.:.,t:I
6TS
~
:
-'
O%p/
0- + 8
H ~
~
A/
8
O
II
O ....... P---O : I
+ 8
H.......o-
II
10
O
11
12TS
Figure 15. Reacting species involved in the associative and dissociative reaction pathways. order M611er-Plesset perturbation theory level (MP2) for the 6-31++G** calculated structures. The optimized transition state structures and reaction free energies are shown in Figure 16 and 17, respectively. The energies of reaction steps are calculated relative to the sum of energies for the separated reactants 1 and 2. The search for the first transition state in the associative mechanism shows the existence of a symmetric transition state 4 T S , that has not been described previously. In the 4TS structure, the carbon atom is oriented anti relative to the unprotonated oxygen atom (Figure 16). The hydrogen atom originating from the reactant water molecule is largely transferred to the participating oxygen of methylphosphate. As a result, both equatorial hydroxyl hydrogens of the phosphate group have a symmetric syn orientation relative to the oxygen atom of the attacking water. The observed symmetry of this transition state directly shows that it has a common structure with the transition structure resulting from attack by hydroxide ion on the neutral form of 1, MeHzPO 4. A very similar spatial configuration of hydroxyl groups is found in the subsequent intermediate product 5, with the penta-coordinated phosphorous atom. The symmetric transition state 4TS is 11.7 kcal/mol (at the HF/6-3 I++G** level ) more stable than the TS structure reported for this step in [21 ] and is, in fact, very close in energy to the intermediate product 5 (Figure 17). The penta-coordinated species undergoes internal rotations of hydroxyl and methoxy groups before yielding the next transition state 6TS, on the way of expelling the methanol molecule, 8. In 6TS, the leaving hydrogen atom of the hydroxyl group remains to a large extent bonded to orthophosphate, rather than to methoxide ion.
282
Figure 16. Optimized HF/6-31++G** geometries of the transition state structures in associative and dissociative reaction pathways. Transition states along the dissociative pathway,9TS and 12TS, correspond to abstraction of the methanol molecule in the first step and to subsequent attack of the water molecule on the metaphosphate. In the first dissociative transition state the participating proton is largely transferred to the leaving group, while the hydrogen in the second transition state essentially remains bonded to the nucleophile. Comparing the energies of the two alternative pathways one finds that the free energy, including the MP2 correction, at the highest point along the associative route (6TS) is 1.2 kcal/mol higher than the energy of the highest point on the dissociative path (12TS). The gas phase energy for the water molecule attack on the phosphate moiety is considerably lower for the associative mechanism, while the activation energy for methanol abstraction is lower for the dissociative mechanism. The energetics of the corresponding solution reactions is likely to be modulated by several factors related to solutesolvent interactions. To address this issue in the best way one should apply several different approaches for including solvent effects on the reaction energetics [21,65], which might still not provide a conclusive answer regarding the exact mechanism. Nevertheless, it is worth noting that the calculated overall activation barriers are close to those observed experimentally for the solution reaction [26,66].
283
Our approach here, is to evaluate the heavy atom KIEs, which can provide important insight into the reaction mechanism, since these have been experimentally measured. Quantitative predictions of heavy atom isotope effects, in general, present a difficult computational problem which is associated with the assumed necessity of using the highest possible level o f ab initio theory, and with possible important contributions from tunneling effects and coupling of solute and solvent vibrational modes [64,67-69]. In the present calculations that deal with a medium size system, viz. methyl phosphate monoanion, we are forced to restrict the treatment to the HF level. Besides, the effects of the polar medium are taken into account using the SCRF model, where the solute molecule is treated as being immersed into a spherical cavity of continuum dielectric. The corresponding results from EIE and KIE calculations using the split-valence 6-31 G* basis set as well as basis sets augmented with extra diffuse and polarization functions on heavy atoms and hydrogens, 6-31+G* and 6-31 ++G**, are given in Table 2. For the deprotonation of orthophosphate we obtain a substantial normal equilibrium isotope effect in agreement with solution experiments, although it is clearly overestimated by the calculations. Our results for the KIEs show that none of the transition states yield a normal isotope effect, as observed experimentally. Both reaction steps of the dissociative pathway show small inverse KIEs, while in the associative mechanism step 2 gives a small normal KIE of 1.0021. The inverse isotope effect predicted for the first associative transition state can be rationalized in terms of well advanced proton transfer to the phosphate group so that the EIE of protonating the phosphate group manifests itself. The second transition state corresponds to the partial deprotonation of the equatorial oxygen atom and this in fact gives the normal KIE for this step. However, the overall value of associative reaction KIE is less than 1. Similar values for the KIEs are obtained in all cases by just considering the zero-point energy vibrational contributions. The actual values of the calculated KIE differences between the associative and dissociative mechanisms depend on the rather complicated pattern of contributions coming from bondstretching and bending frequencies. As expected [64,69], expansion of the basis set has some effect on our calculated EIE and KIE values, but the qualitative picture remains the same. The fact that the EIE on phosphate deprotonation is substantially overestimated compared to experiment would presumably lead to an underestimation of the KIEs for the associative transition states. This is because both 4TS and 6TS have doubly protonated character on the equatorial oxygens, so that a too inverse effect on protonation of the phosphate monoanion would reduce the overall KIE for this reaction path. Conversely, one might expect the KIE for the dissociative transition states to be somewhat
284
AG (kcal/mol) 4
-i-J .
.
.
.
.
1.6 .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
-10
Figure 17. Relative gas phase energies of methylphosphate ester hydrolysis for the structures given in Figure 16. Free energies calculated at the MP2/6-31++G**//HF/6-31++G** basis set level, T=298.15 K, frequencies scaled by 0.8929. overestimated since the non-bridging oxygens in both 9TS and 12TS have unprotonated character. The inclusion of solvent effects using continuum SCRF models does neither provide any significant effect on the geometries of the reactive species (the ionic P-O bond lengths change by around 0.01-0.02 A) nor on the calculated EIE and KIE values. However, the observed trend demonstrates a small increase in the value of normal EIE related to deprotonation of phosphate. The corresponding small decrease in the KIE of the associative mechanism and increase in value of dissociative mechanism are also found. Another clue for improving the theoretical description of the considered isotope effects is obtained when we try to rationalize the relatively large discrepancy between the experimental and calculated EIE for deprotonation of the phosphate. The ratio of the v(~60)/v(~80) for the O-H bond frequencies can be estimated from the Hooke's law as 1.00328. The change in the zero point energy for the breaking of the 160-H bond in the reaction HzPO4---~HPO42-is calculated as 7.80 kcal/mol using the 6-3 I++G** basis set. Consequently, this provides an estimate for the EIE related to the hydrogen abstraction equal 1.0440 based only on the isotope effect on the reactant O-H frequency. The EIE for the phosphate deprotonation based on the actual ZPEs for reactants and products equals 1.0501, which is close to the above estimate for the hydrogen abstraction.
285
Table 2. Calculated and experimental values for 180 equilibrium and kinetic isotope effects in methylphosphate monoanion reactions. IE, calculated a Reaction
IE, experiment gas phase
H2p1804- ~---x Hp18042-
6-31G* 1.0378
6-3I+G* 1.0455
water 6-3I++G** 6-3I++G** 1.0466 1.0466 (1.0400)*
(1.0444)*
1.019 (0.001) [61]
Me 16Op1803H-~- Me 16Op18032-
1.0214
1.0279
1.0279
1.0301
1.015 (0.002) [61 ]
Associative hydrolysis
0.9936
0.9874
0.9884
0.9832
1.013 (0.002)[62]
Dissociative hydrolysis
0.9979
0.9969
0.9979
1.0000
1.013(0.002) [62]
a Frequencies scaled by 0.8929, T=298.15 K. * One water molecule in complex with the phosphate.
The larger EIE in the latter case reflects the weakening of P-O bonds in the phosphate dianion. Thus, ab initio calculations seemingly provide a realistic estimate for the phosphate deprotonation in the gas phase, but disagree with the experimental data in solution. This raises the question whether the water molecules play a more active role in phosphate hydrolysis than just providing a polar environment for the intramolecular processes. One promising but not completely conclusive attempt of such kind is given in [65], where the authors tried to search the associative and dissociative hydrolytic pathways with the extra proton transfer route over the water molecules. Here we investigate the issue of solute vibrational coupling to the solvent in the calculations of isotope effects for the phosphate deprotonation reaction. We model the microscopic effects of the water molecules in the simplest manner just by considering the double-hydrogen bonded complex of inorganic phosphate with one water molecule. A substantial drop by 0.0066 in the calculated gas phase values of EIE is observed in such a model system (Table 2). The inclusion of the continuum solvent effects using the Onsager model again works in the opposite manner and increases the EIE. These results indicate that the calculation of isotope effects in phosphate hydrolysis are not reliable without a realistic treatment of the solute vibration coupling to solvent (H-bonds), but the inclusion of electron correlation might as well substantially change the calculated KIE values. Extension of the computational models to treat these effects will necessarily lead to a better understanding of the heavy atom kinetic isotope effects in phosphate hydrolysis.
286
REFERENCES
[ 1] T. Hunter, Cell 58 (1989) 1013. [2] E.H. Fischer, H. Charbonneau, N.K. Tonks, Science 253 (1991) 401. [3] D. Barford, Z. Jia, N.K. Tonks, Nature Struct. Biol. 2 (1995) 1043. [4] J.A. Stuckey, H.L. Schubert, E.B. Fauman, Z.-Y. Zhang, J.E. Dixon, M.A. Saper, Nature 370(1994) 571. [5] X.-D. Su, N. Taddei, M. Stefani, G. Ramponi, P.Nordlund, Nature 370 (1994) 575. [6] T.M. Logan, M.M. Zhou, D.G. Nettesheim, R.L. Meadows, R.L. Van Etten, S.W. Fersik, Biochemistry 33 (1994) 11087. [7] M. Zhang, R.L. Van Etten, C.V. Stauffacher, Biochemistry 33 (1994) 11097. [8] D. Barford, A.J. Flint, N.K. Tonks, Science 263 (1994) 1297. [9] Z. Jia, D. Barford, A.J. Flint, N.K. Tonks, Science 268 (1995) 1754. [10] J. Yuvaniyama, J.M. Denu, J.E. Dixon, M.A. Saper, Science 272 (1995) 1328. [ 11 ] M. Zhang, M. Zhou, R.L. Van Etten, C.V. Stauffacher, Biochemistry 36 (1997) 15. [12] Z.-Y. Zhang, W.P. Malachowski, R.L. Van Etten, J.E. Dixon, J. Biol. Chem. 269 (1994) 8140. [13] J.M. Denu, G. Zhou, Y. Guo, J.E. Dixon, Biochemistry 34 (1995) 3396. [ 14] B. Evans, P.A. Tishmack, C. Pokalsky, M. Zhang, R.L. Van Etten, Biochemistry 35 (1996) 13609. [15] M.S. Saini, S.L. Buchwald, R.L. Van Etten J.R. Knowles, J. Biol. Chem. 256 (1981) 10453. [16] Z. Zhang, E. Harms, R.L. Van Etten, J. Biol. Chem. 269 (1994) 25947. [ 17] N. Taddei, P. Chiarugi, P. Cirri, T. Fiaschi, M. Stefani, M. Camici, G. Raugei, G. Ramponi, FEBS Lett. 350 (1994) 328. [ 18] A. Warshel, Computer Modeling of Chemical Reactions in Enzymes and Solutions. New York: Wiley, 1991. [19] A. Warshel, F. Sussman, J.-K. Hwang, J. Mol. Biol. 201 (1988) 139. [20] J. Aqvist, A. Warshel, Chem. Rev. 93 (1993) 2523. [21] J. Florian, A. Warshel, J. Phys. Chem. B 102 (1998) 719. [22] J. Florian, J./kqvist, A. Warshel, J. Am. Chem. Soc. 120 (1998) 11524. [23] D.G. Gorenstein, B.A. Luxon, J.B. Findlay, J. Am. Chem. Soc. 101 (1979) 5869. [24] A. Yliniemela, T. Uchimaru, K. Kanabe, K. Taira, J. Am. Chem. Soc. 115 (1993) 3032. [25] J. Aqvist, K. Kolmodin, J. Florian, A. Warshel, Chemistry & Biology 6 (1999) R71. [26] J.P. Guthrie, J. Am. Chem. Soc. 99 (1977) 3991. [27] T. Hansson, P. Nordlund, J. Aqvist, J. Mol. Biol. 265 (1996) 118. [28] J. Aqvist, M. Fothergill, J. Biol. Chem. 271 (1996) 10010. [29] A.J. Kirby, A.G. Varvoglis, J. Am. Chem. Soc. 89 (1967) 415. [30] S. Akerfeldt, Acta Chem. Scand. 17 (1963) 319. [31] N. Bourne, A. Williams, J. Org. Chem. 49 (1984) 1200. [32] K. Kolmodin, T. Hansson, J. Danielsson, J. Aqvist. (1998) ACS Symposium series 721, 370. [33] C. J. Cramer, D.G. Truhlar, Science 256 (1992) 213. [34] J.M. Karle, I.I. Karle, Acta Cryst. C 44 (1988) 135. [35] W.F van Gunsteren, H.J.C. Berendsen, Groningen Molecular Simulation (GROMOS) Library Manual, Biomos BV, Nijenborgh 16, Netherlands: Groningen, 1997.
287
[36] Insight II, Biosym/MSI, San Diego, USA, 1995. [37] J. Marelius, K. Kolmodin, I. Feierberg, J. Aqvist, J. Mol. Graph. Model. 16 (1998) 213. [38] F.S. Lee, A. Warshel, J. Chem. Phys. 97 (1992) 3100. [39] Z.-Y. Zhang, R.L. Van Etten J. Biol. Chem. 266 (1991) 1516. [40] Z.-Y. Zhang, R.L. Van Etten, Biochemistry 30 (1991) 8954. [41] J. Aqvist, C. Medina, J.-E. Samuelsson, Protein Eng. 7 (1994) 385. [42] J. Marelius, M. Graffner-Nordberg, T. Hansson, A. Hallberg, J. Aqvist, J. Comput.-Aided Mol. Design 12 (1998) 119. [43] K. Kolmodin, P. Nordlund, J. Aqvist, Proteins 36 (1999) 370. [44] A.C. Hengge, Z. Yu, L. Wu, Z.-Y. Zhang. Biochemistry 36 (1997) 7928. [45] A.D.B. Pannifer, A.J. Flint, N.K. Tonks, D Barford, J. Biol. Chem. 273 (1998) 10454. [46] P. Cirri, P. Chiarugi, G. Camici, G. Manao, G. Raugei, G. Capugi, G. Ramponi, Eur. J. Biol. Chem. 214 (1993) 637. [47] J.P. Davis, M.-M. Zhou R.L. Van Etten, J. Biol. Chem. 269 (1994) 8734. [48] K. Kolmodin, J. Aqvist, FEBS letters 456 (1999) 301. [49] M.M.G.M. Thunnissen, N. Taddei, G. Liguri, G. Ramponi, P. Nordlund, Structure 5 (1997) 69. [50] M. Eigen, Angew. Chem. (Intl. Ed. Engl.), 3 (1964) 1. [51] J.Lobaugh, G.A.Voth, J. Chem. Phys. 100 (1994), 3039. [52] J.K. Hwang, Z.T. Chu, A. Yadav, A. Warshel, J. Phys. Chem. 95 (1991) 8445. [53] I. Feierberg, V. Luzhkov, J. Aqvist (submitted). [54] G. H. Peters, T.M. Frimurer, O.H. Olsen, Biochemistry 37 (1998) 5383. [55] E.B. Fauman, J.P. Cogswell, B. Lovejoy. W.J. Rocque, W. Holmes, V.G. Montana, H. Piwnica-Worms, M.J. Rink, M.A. Saper, Cell 93 (1998) 617. [56] K. Kolmodin, J. Aqvist, FEBS letters 465 (2000) 8. [57] S.J. Benkovic, K.J., Schray, In: P.D. Boyer, Ed., The Enzymes, 201-238, Academic Press, New York 1973. [58] G.R.J. Thatcher, R. Kluger, Adv. Phys. Org. Chem., 25 (1989) 99. [59] S. Admiraal, D. Herschlag, Chemistry & Biology, 2 (1995) 729. [60] K. Scheffzek, M.R. Ahmadian, W. Kabsch, L. Wiesmuller, A. Lautwein, F. Schmitz, A. Wittinghofer, Science 277 (1997) 333. [61] W.B. Knight, P.M. Weiss, W.W. Cleland, J. Am. Chem. Soc. 108 (1986) 2759. [62] P.M. Weiss, W.B. Knight, W.W. Cleland, J. Am. Chem. Soc. 108 (1986) 2761. [63] M.J. Frisch, et al., Gaussian 94, Revision B2, Gaussian Inc., Pittsburgh PA, 1995. [64] J.B. Foresman, A. Frisch, Exploring Chemistry with Electronic Structure Methods, 2nd ed. Gaussian, Inc., Pittsburgh, PA, 1996. [65] C.-H. Hu, T. Brinck, J. Phys. Chem. A 103 (1999) 5379. [66] C.A. Bunton, D.R. Llewellyn, K.G. Oldham, C.A. Vernon, J. Chem. Soc. (1961) 2670. [67] N.J. Harris, J. Phys. Chem. 99 (1995) 14689. [68] W.-P. Hu, D.G. Truhlar, J. Am. Chem. Soc. 118 (1996) 860. [69] S.S. Glad, F. Jensen, J. Phys. Chem. 100 (1996) 16892.
This Page Intentionally Left Blank
L.A. Eriksson (Editor) Theoretical Biochemistry- Processes and Properties of Biological Systems Theoretical and Computational Chemistry, Vol. 9 9 2001 Elsevier Science B.V. All rights reserved
Chapter 8
Monte Carlo simulations of HIV-1 protease binding dynamics and thermodynamics with ensembles of protein conformations 9 incorporating protein flexibility in deciphering mechanisms of molecular recognition Gennady M. Verkhivker ,*Djamal Bouzida, Daniel K. Gehlhaar, Paul A. Rejto, Lana Schaffer, Sandra Arthurs, Anthony B. Colson, Stephan T. Freer, Veda Larson, Brock A. Luty, Tami Ma-rrone, and Peter W. Rose Agouron Pharmaceuticals, Inc., A Warner-Lambert Company 10777 Science Center Drive, San Diego, CA 92121-1111 USA
I. Structural m o d e l s of molecular recognition Understanding of the molecular recognition mechanisms has been greatly advanced in the last decade by computer simulations of ligand-protein interactions on an atomic level [1-12] and by studying the nature of the underlying energy landscape which describes the free energy of the system as a function of its coordinates [13-23]. It has been recognized that proteins are not ~ adequately described by a single conformational state, but are better represented by a manifold of low-energy protein conformations, conformational substates, on a rugged energy landscape [24-30]. A typical folded protein has a well-defined overall fold, but upon closer examination it may be seen *the corresponding author
289
290
as a myriad of different nearly isoenergetic structures, populated in a thermal Boltzmann equilibrium. Current view of the protein energy landscape picture implies that the conformational substates, that represent local minima of the protein, are organized in hierarchical tiers that are separated by barriers which can be crossed by thermal activation [27-30]. Within this hierarchy, alternative conformational states are defined by significant differences in protein conformation and large energy barriers, while modest coordinate changes, with concomitantly smaller energy barriers, characterize alternative protein conformational substates. Recent optical experiments have studied conformational fluctuations of myoglobin in real time and have suggested that proteins may have a hierarchy of energy barriers on different length and energy scales [27,28]. According to the emerging wisdom, the protein energy landscapes may be characterized by a number of discrete tiers of conformational substates, each tier within a hierarchy of conformational substates having approximately the same barrier, but with separate tiers having different distributions of barrier heights. Furthermore, it was suggested that the protein energy landscapes are self-similar, i.e. the protein fluctuations associated with each tier in the hierarchy of conformational substates belong to the same class of global conformational arrangements [27,28]. Accessibility of alternative conformational states is important for protein function, including assembly, molecular recognition, regulation of biological activity, and enzymatic catalysis [27-30]. Protein structures determined in different environments, at high pressure, under various pH and solvent conditions, in different crystal forms as well as bound to inhibitors provide information about protein responses and protein conformational substates. Protein mutants may also be regarded as local perturbations of the native structure, and comparison of mutant crystal structures typically reveals conformational protein substates [30]. Another type of perturbation of the protein conformation is usually seen during complex formation with peptides, substrates, ions or ligands. Analogous to a typical folded protein, ligand-protein complexes generally have a well-defined native structure, but on a microscopic level a ligand-protein system may exhibit structural disorder that is revealed on different length and time scales: by rotation of a local protein side-chain, by conformational change of the ligand in the active site or by a collective conformational change associated with a movement of the protein backbone, side-chains and a change of the ligand binding mode. There are two regulatory mechanisms
291
whereby binding can produce a significant conformational rearrangement of the ligand-protein complex structure. In kinetic regulation, a barrier separating two conformational substates is reduced or eliminated as a result of complex formation. In thermodynamic regulation, the free energy of an alternative conformational state is lowered and becomes the new free energy minimum. In both scenarios, the overall shape of the energy landscape is preserved and the initial and alternative conformations remain local energy minima. Hence, local perturbations, for instance a single mutation in the active site or ligand binding, need only induce minor adjustments in a barrier height or in the relative energetics of the local minima in order to rearrange the conformational substate of the system [30]. Conformational substates that represent local minima of the ligand-protein system can be organized in a hierarchy of ligand-protein binding modes and corresponding families of protein conformational fluctuations. Distinct, functionally important conformational substates of the HIV-I protease have been observed by comparing the crystal structure of the protease in its unbound form with the crystal structures of the same protein in complexes with a diverse set of inhibitors [31-33]. With the aid of high sensitivity differential scanning calorimetry, the exact balance of the ~intra-subunit and intersubunit energetic contributions was elucidated and structural distribution of forces that stabilize HIV-I protease was determined [34]. The HIV-I protease stabilization free energy is primarily determined by the dimerization interface, whereas the isolated subunits are not stable. Only after dimerization, where a moderate decrease in conformational entropy is offset by a much larger increase in solvation entropy, the resulting entropy contribution becomes favorable for stabilization of the HIV-I protease. A structure-based thermodynamic analysis approach has reproduced the balance of stabilizing contributions and the magnitude of the Gibbs free energy of HIV-I protease stabilization in agreement with the experimental measurements of the free energy and its enthalpy and entropy components [34]. This approach is based on structural parameterization of folding and binding energetics of proteins, peptides and synthetic ligands [35-41], whereby experimental results on the types of stabilizing forces in folding and binding were used to establish the appropriate energy model. The resulting binding free energy model is based on the conjecture that the underlying physical forces that govern the process of ligand-protein binding are the same as in protein folding [42-45]. It is widely recognized that the major components of protein stabilization are
292
hydrophobic interactions and hydrogen bonds, with the hydrophobic effect representing the dominant force in stabilizing the protein structure and defined as the combined effect of protein internal van der Waals interactions and hydration of non-polar groups [42-45]. Consequently, structural parameterization of binding energetics is calculated separately for the enthalpy and entropy components of the Gibbs free energy, includes the electrostatic and ionization effects, and the contribution due to the change in translational degrees of freedom. The enthalpy contribution of the free energy results from the formation of van der Walls interactions, hydrogen bonding and concomitant desolvation of the interacting groups. This free energy component is parameterized in terms of changes in apolar and polar solvent-accessible surface areas. The entropy contribution is composed of solvation component and changes in conformational degrees of freedom. The magnitude of conformational entropy contributions for each amino acid has been estimated by computing the probability profiles of different conformational states as a function of dihedral angles [35,39]. A detailed structural mapping of the ligand-protein binding energetics has been performed for a number of peptidic and synthetic HIV-1 protease inhibitors [40]. Thermodynamic analysis has shown that the inhibitor binding to the HIV-1 protease is not enthalpy favored and that the major contribution to the Gibbs free energy is determined by the hydrophobic effect, resulting from the favorable entropy of water molecules released from ligand and protein groups. The enthalpy contributions are unfavorable at room temperature and are dominated by the positive enthalpy of desolvating hydrophobic groups. The driving force of binding, determined by the entropy gain, is opposed by the positive enthalpy change, negative change in conformational entropy of the inhibitor and protease side-chains as well as the negative change in translation entropy. These data are consistent with the calorimetric analysis of hydration enthalpy and entropy contributions to protein folding [46,47], supporting the notion that stabilizing forces in protein folding and ligand-protein binding are similar and appropriately derived energetic models can adequately describe both folding and binding phenomena. The structure-based thermodynamic method combines the derived binding free energy model with the formalism which computes probabilities of individual amino acids being folded in native-like conformations and thereby allows to determine structural stability of different protein regions [48-53]. In a single site thermodynamic mutation approach, the cooperativity of in-
293
teractions in the protein can be examined by computing the free energy of all available protein states given a particular residue being held in its folded native conformation [54]. The HIV-1 protease stabilization free energy is not uniformly distributed along the dimerization interface and the binding site has the dual character with the regions of high and low structural stability [55]. The flap region of the HIV-1 protease molecule has only marginal stability, a high propensity to undergo independent local unfolding and is forced into a closed conformation by favorable interactions with the inhibitor whereas the flap reorganization energy is unfavorable. The existence of multiple conformational substates for the HIV-1 protease, which is caused by the presence of several mobile regions undergoing local folding-unfolding transitions upon ligand binding, results in local cooperativity effects. This allows to characterize structural and energetic distributions of the protein response caused by energy perturbations originated at different locations of the prorein, which can be induced by either mutations or ligand binding [54]. The effect of inhibitor binding on stability and cooperativity of the HIV-1 protease was elucidated by identifying the protease regions with high and low structural stability and discovering that local cooperativity effects are not limited to the active site residues, but can propagate to a small subset of remote protease regions [55]. The HIV-1 protease residues that have low structural stability in the uncomplexed binding site are involved in selective transmission of the binding stimulus to distant protease regions [56]. Structural disorder of the HIV-1 protease, which is localized in several mobile regions and a dual character of the active site with regions of high and low structural stability can serve important biological functions, in particular, conferring inhibitor resistant mutations in the HIV-1 protease [57-60].
II. Structure-based analysis of HIV-1 proteaseinhibitor binding A number of HIV-1 protease inhibitors, used in clinic as therapeutic agents, have produced resistant variants with point mutations in various regions of the protease. Forty weeks of treatment with indinavir [61] produces a 15-fold resistant variant with five L10R/M46I/L63P/V82T/I84V mutations [62]. A 10-25 fold resistant variant with five M46I/L63P/A71V/V82F/I84V muta-
294
tions emerges in the presence of ritonavir [63,64]; 100-fold resistant virus with G48V/L90M mutations appears during therapy with the saquinavir inhibitor [65,66]. The reduction in binding affinity for saquinavir with L90M, G48V and L90M/G48V mutants is primarily due to larger dissociation rate constants and a decrease in the internal equilibrium between the bound inhibitor with the protease flaps up and the bound inhibitor with the flaps down [67]. More evidence was recently provided to the conjecture that the reduction in affinity between HIV-1 protease inhibitors and a particular mutant can be due to a reduction in protease dimer stability, in addition to, and independent of the intrinsic inhibitor affinity for the mutant dimer [68]. Thermodynamic equilibrium studies, conducted for a number of inhibitors on several drugresistant HIV-1 protease mutants V82F, V82F/I84V, V82T/I84V, L90M, have shown that reduction in the binding affinity is due to a combined effect of both the dimer stability and the inhibitor binding. Mutations conferring resistance are located in the HIV-1 protease regions of different structural stability : the active site, flaps, dimer interface, and the surface loops. Active site mutations typically play a leading role in modulating the affinity of the protease because mutations usually accumulate in a stepwise fashion, appearing first in the active site and then in compensatory regions. Mutations that are located in the active site reduce the number of favorable van der Waals contacts, increase steric hindrance or produce unfavorable electrostatic interactions. The loss of binding affinity going from the wild-type HIV-1 protease to mutants can be attributed in 40-65 % cases to amino acid mutations away from the active site and not in direct contact with the inhibitor [69]. Substitutions outside of the active site are thought to produce compensatory changes that affect the activity of the protease by either altering the stability of the protease dimer or indirectly influencing binding through long-range cooperative interactions [70]. According to the single site thermodynamic mutation approach, binding perturbations in the low stability flap region can trigger a large redistribution of the conformational protein ensemble and the free energy required to bring the flap into the optimal binding conformation can be affected by distant mutations. This type of flexibility enhances the probability of generating resistant forms of the protease with mutations in the flap region. The growing body of structural and thermodynamic data has revealed similarities and differences in molecular origins for inhibitorsspecificity against HIV-1 protea~e and its various mutant forms. The crystal structures of three
295
mutant protease I84V, V82, V82F/I84V complexes with cyclic urea-based inhibitors DMP323 and DMP450 have been solved to explain modulation in inhibitor binding [70]. These mutations represent key protease residues associated with the HIV-1 protease resistance towards this class of inhibitors. The substitutions produce only local perturbations that alter the network of van der Waals ligand-protease contacts, but retain the hydrogen bonding pattern. It appears, that mutations are not additive and compensatory shifts in the I84V and V82F/I84V complexes produce a small number of new contacts, which are insufficient to compensate the initial loss of interactions caused by mutations. In a subsequent study, the inhibitors which included indinavir [61,62], ritonavir [63,64], saquinavir [61,62], nelfinavir [71] and 14 cyclic urea-based inhibitors were tested against the V82F, V82, I84V and V82F/I84V mutations [72]. Single mutations V82F and I84V cause moderate changes in binding affinity as compared to the wild-type complexes, whil more significant changes have been observed for the double mutation V82F/I84V. It was suggested that the therapeutic effectiveness of DMP323 and DMP450 inhibitors may be improved by increasing the size and flexibility of the inhibitor to maintain a certain critical number of favorable contacts and to accommodate to protein conformational changes by forming new interactions that were lost in the mutation sites. A series of novel cyclic urea inhibitors was developed [73] based on the premise that the number of hydrogen bonding interactions between the designed inhibitor and the HIV-1 protease backbone should remain constant, while a larger number of non-bonded contacts must be maintained throughout the entire binding site. Crystal structures of HIV-1 protease complexes with DMP-323, XV368 and SD146 inhibitors rationalized the dramatic improvement in the resistant profile, exhibited by larger and more flexible cyclic urea derivatives XV368 and SD146 of the original DMP323 inhibitor [73]. Subsequently, the crystal structures of the three active-site mutant proteases V82F, I84V and V82F/I84V in complexes with XV368 and SD146 have identified interactions that are responsible for the high potency and broad specificity of these inhibitors [74]. These structural results have suggested that high potency against the wildtype HIV-1 protease and retained affinity to a broad spectrum of mutations conferring resistance can be achieved by increasing the total number of hydrogen bonds, while sustaining the hydrogen bonds formed to the protease backbone and preserving favorable ligand-protease contacts in all six enzyme subsites.
296
II.1. Structure-based analysis of HIV-1 protease-SB203386 inhibitor binding Comparative structure analysis of HIV-1, HIV-2 and SIV protease in complexes with the same inhibitor has shown only minor differences and nearly identical protease tertiary structures [75-78] but may exhibit different ligand binding modes. An unexpected binding mode with two symmetry-related molecules each bound to half of the active site has been found in the complex of the SB203386 inhibitor with SIV protease [78]. Recently, it was determined that mutating residues from 31 to 37 alone in 30's loop of the HIV-1 protease produce a resistance pattern against a broad range of inhibitors, including SB203386 [78-80]. In order to determine individual contributions of the 30's loop residues to the binding affinity and specificity, a number of chimeric proteases were constructed [80]. The crystal structures of the SB203386 complexes with three chimeric HIV-1 proteases, denoted as HIV-1 (2:31-37), HIV-1 (2: 31,33-37) and HIV-1 (2:31-37,47,82), in which the HIV-1 protease residues were substituted by the corresponding amino acids of the HIV-2 protease, have been recently determined at high resolution [81]. These structures have provided significant additional insights into the molecular basis for SB203386 selectivity pattern against HIV-1 protease mutants. There is a general trend in decreasing binding affinity of SB203386 for the protease as the number of HIV-2 protease residues increases, except for the HIV-1 (2:31-37,47,82) which reverts to a moderate affinity and the wild-type complex mode of binding. The HIV-1 protease triple mutant V32I/I47V/V82I, denoted as HIV-1 (2: 32,47,82), HIV-1 (2: 31,33-37) and HIV-1 (2: 31,33-37,47,82) mutants have a moderate and similar effect on the SB203386 inhibitor affinity [80]. While binding affinities of the wildtype complex and the HIV-1 (2: 32,47,82) triple mutant correspond to the Ki values of 18 nM and 110 nM, the Ki values of HIV-1 (2: 31,33-37) and HIV-1 (2: 31,33-37,47,82) chimeras are 210 nM and 460 nM respectively. These mutations, however, are not as nearly detrimental for SB203386 binding affinity as a combined effect in the HIV-1 (2:31-27) chimera where the Ki value of 1410 nM is similar to the activity seen in SB203386 complexes with HIV-2 protease (Ki=1280 nM) and SIV protease (Ki-960 nM). The binding mode of the SB203386 inhibitor in the HIV-1 protease triple mutant HIV-1 (2: 32,47,82), where the HIV-1 protease residues were mutated to the corresponding amino acids of HIV-2 and SIV proteases, remains identical to
297
the wild-type complex. Introducing the HIV-2 residues at positions 31 and 33-37 moderately increase the Ki value by 12-fold, and maintains the HIV-1 protease-like mode of SB203386 binding. However, adding to this change the Ile residue at position 32 as in HIV-1 (2: 31-37) increases the Ki to a value comparable to that of HIV-2 or SIV proteases, and changes the inhibitor mode of binding to two ligand molecules in the active site, as seen in the SIV protease complex [81]. The binding mode of SB203386 in the complex with HIV-2 protease has not been determined crystallographically, but it is expected to be similar to that of in the SIV complex [80,81]. This has led to the conjecture that the SB203386 inhibitor binding affinity and specificity may be conferred by a combination of the active site residues 32, 47, 82 along with a loop of residues 31-37, which mostly lie outside of the active site. In the crystal structure of the HIV-1 (2:31-37) chimera complex with SB203386, structural changes in the vicinity of the active site residue Ile32 result in the extension of 80's loop residues towards the active site and cause the decrease in the size of the active site cavity. These structural changes were also observed in the crystal structure of the SB203386 complex with SIV protease. It was suggested that not only changes in the 30's loop may affect the structural stability of the protease dimer, but also the induced changes in the 80's loop may have a detrimental effect on the interactions with the inhibitor and a subsequent significant reduction of the SB203386 binding affinity [81]. In contrast, the crystal structure of the HIV-1 (2:31,3337) chimeric complex does not show the 80's loop motions and the observed loss in the SB203386 inhibitor binding affinity was primarily attributed to the changes in the dimer stability caused by mutations in 30's loop sequence. The structural flexibility in the 30's and 80's loops observed in the HIV-1 (2: 3137,47,82) chimeric complex with SB203386 combined with the compensatory enlargement changes of the active site cavity relative to the HIV-1 (2: 3137) complex were suggested as primary reasons for the restoration of the wild-type ligand binding mode and less dramatic loss of affinity [81]. A widely accepted two-step mechanism of HIV-1 protease binding implies the creation of a loose complex with the open form of the enzyme, followed by the conformational change involving the closure of the flap region over the active site and formation of the final bound complex. Consequently, binding affinity differences between the HIV-1 protease and its mutants may also result from the changes in the internal equilibrium between the bound form of the protease with closed flaps conformation and the unbound open form
298
of the enzyme. The reduction in the inhibitor binding affinity between the wild-type HIV-1 protease and a particular mutant can be due to changes in the protease dimer stability, independent of the differences in the inhibitor interaction energies. The flap shifts observed in crystal structures of the chimeric HIV-1 (2: 31,33-37), HIV-1 (2: 31,33-37,47,82), and HIV-1 (2:31-27) proteases suggested that, in addition to the changes in the enzyme-inhibitor interactions, a decreased stability of the closed form of the enzyme in solution may contribute to the reduction in binding affinity observed in complexes of SB203386 with these chimeric proteases. In this study, however, we focus only on the analysis of changes in the enzyme-inhibitor interactions and the role of compensatory changes in the active site residues to the binding affinity reduction of the SB203386 complexes with the HIV-1 protease mutants.
III. Structure-based computational models of ligand-protein binding dynamics and molecular docking Computational studies of molecular recognition usually require the consistent and rapid determination of the global energy minimum of a ligandprotein complex which must correspond to the experimentally solved X-ray structure [1-12]. Recent advances in computational structure prediction of ligand-protein complexes utilize a diverse range of energetic models, based on either surface complementarity [82-89] or atom-atom representations of the intermolecular interactions [90-95]. A variety of optimization docking techniques include Monte Carlo methods [96-98], molecular dynamics [99,100], genetic algorithms [101-103], tabu searching algorithm [104] and are focused primarily on molecular docking of flexible ligands into proteins which are held fixed in a bound conformation, while the internal degrees of freedom of the ligand and its rigid body variables are optimized. Combined flexible ligand docking and protein side-chain optimization techniques have been recently proposed in molecular recognition studies [105-107]. A variant of the dead-end elimination (DEE) algorithm has been used to avoid a combinatorial explosion by restricting both the ligand and the side-chains of the receptor residues to a limited number of discrete low-energy conformations [105]. The combinatorial problem in flexible peptide docking with major
299
histocompatibility complexes receptors was also approached by utilizing the DEE algorithm to optimize protein side-chains that adopt to the docked peptide conformations [106]. A hierarchical computational approach was introduced for predicting structures of ligand-protein complexes and analyzing binding energy landscapes, which combines Monte Carlo simulated annealing technique t o determine the ligand bound conformation with the DEE algorithm for side-chain optimization of the protein active site residues [107]. Limited protein side-chain flexibility has been employed in the GOLD program [103]. These approaches incorporate protein flexibility by using rotamer libraries of side-chains [105-107], Monte Carlo simulations combined with minimization in flexible binding sites [98] or molecular dynamics docking simulations [100]. A combination of energetic models with stochastic optimization techniques have led to a number of powerful strategies for computational structure prediction of ligand-protein complexes and docking of flexible ligands to a protein with a rigid backbone and flexible side-chains has now become more feasible [105-109]. The NP-hardness of the ligand-protein recognition problem, as in protein folding, implies that for a given protein there may be ligands that do not find the global free energy minimum on the binding energy landscape in a reasonable amount of computer time given a high degree of complexity and frustration of the underlying binding energy landscape. Nevertheless, ligand-protein complexes with experimentally determined X-ray structures must recognize their global free energy minimum rapidly and consistently. The energy of the crystallographic structure of the ligand-protein complex must be the global minimum on the binding energy landscape, representing a thermodynamic requirement on the energy function in docking simulations, and this conformation must be accessible during the search, which is a kinetic condition of the docking problem. A simplified energy function in combination with evolutionary sampling technique was developed to satisfy both thermodynamic and kinetic requirements in docking by reducing frustration of the underlying binding energy landscape [91,92,110,111]. Robust structure prediction of bound ligands given a fixed conformation of the native protein can be achieved with the family of simplified knowledge-based energy functions by generating binding energy landscapes with co-existing correlated, funnel-like [15-23,112-114] and uncorrelated, rugged features. While adequate for non-polar and hydrogen bonds patterns, this simplified energy - - include a direct electrostatic component and therefore may
300
be expected to fail when extensive networks of electrostatic interactions are present in the crystal structures. By contrast, the GOLD algorithm employs a template of protein hydrogen bond donors and acceptors, and uses a genetic algorithm to sample intermolecular hydrogen bonds networks and ligand conformations [103]. This approach lacks a desolvation component and was found to be less suitable in finding hydrophobic interactions. Docking methodologies implemented in such programs as Hammerhead [93], FLEXx [94], and GOLD [103] have been validated on a large number of ligand-protein complexes with known crystal structures to test robustness of the method. There have been also studies which employed explicit protein flexibility [115,116]. However, the results of flexible ligand docking with a receptor in the absence of any experimentally known protein bound conformation are considerably less reliable [117]. Applications of flexible ligand docking techniques range from the analysis of the binding energy landscapes [118,119] to lead discovery [120], database mining [121], and structure-based combinatorial ligand design [122] and include simulations with ensembles of multiple ligands [123] and ensembles of multiple protein conformations [124,125]. A recently introduced molecular docking technique employs a set of related crystal structures as "snap shots" of a dominant protein conformation perturbed by different ligands, crystallization conditions and simple mutations [124]. The analysis of the effect of multiple protein conformational substates in response to ligand binding has led to some practical recipes to effectively account for the types of protein flexibility that may occur upon ligand binding [124,125]. Docking simulations usually determine a single structure of the complex with the lowest energy and postulate that the lowest energy conformation corresponds to the native structure. The number of low-energy structures is usually very large and a computationally demanding task of finding the lowest energy structure does not imply its thermodynamic stability. Nevertheless, the structure prediction problem implies determination of the ensemble of many similar conformations which describe the thermodynamically stable native basin of the global energy minimum rather than a single structure [126]. We have previously established that the results of kinetic docking simulations can be rationalized based on the thermodynamic properties of ligand-protein binding determined from equilibrium simulations and the analysis of the binding energy landscape [118,119,127,128]. The robust topology of the native structure is a decisive factor contributing to the ther-
301
modynamics and dynamics of well-optimized ligand-protein complexes such the MTX-DHFR system, that appear to be robust to structural perturbations, variations in the ligand composition and accuracy of the energetic model [127,128]. Topological features of the native complexes that are critical for robust structure prediction and thermodynamic stability and are determined by early ordering of the recognition ligand motif in its native conformation. Structural stability of these motifs contributes decisively to the topology and thermodynamic stability of the native ligand-protein complex [127,128]. These molecular fragments, termed recognition anchors, exhibit a high structural consensus or accessibility of the dominant native binding mode in docking simulations [20,111]. In addition, these molecular fragments maintain structural stability of the bound conformation when embedded in larger molecules, a property that we termed structural harmony. For 'optimal' ligand-protein complexes, native interactions are stronger on average than non-native interactions, which results in gradual energy decrease as the native interactions are progressively formed and a dominant, conformational funnel leading to the native structure [118,119,127,128]. Comparing the results of validation docking experiments performed on a large number of Protein Data Bank (PDB) ligand-protein complexes with the GOLD program [103] and with our docking strategy [91,92], we have detected a number of complexes where both methods fail to predict the crystal structures [129]. Misdocked predictions in ligand-protein docking can be categorized as soft and hard failures. Soft failure is defined as the case when the energy of the crystal structure, after minimization with the chosen force field, is lower than the energy of the lowest energy conformation found in docking simulations. A soft failure is due to a flaw in the search algorithm, which is unable to find the global energy minimum. Hard failures are more difficult; they arise when the energy of a misdocked structure is lower than the energy of the minimized crystal structure. Hard failures result from an inability to accurately reproduce subtle differences in the relative energies of alternate binding modes, a problem that compounded by competing electrostatic and van der Waals interactions which results in a frustrated binding energy landscape. A hierarchical approach, that involves a hierarchy of energy functions, has been proposed in the analysis of common failures in molecular docking [129,130]. This protocol identifies clusters of structurally similar low-energy conformations, generated in equilibrium simulations with the simplified energy function, followed by subsequent energy minimization
302
with the molecular mechanics force field. The successes and failures in docking simulations have been explained based on the thermodynamic properties determined from equilibrium simulations and the shape of the underlying binding energy landscape.
IV. Computer simulations of ligand-protein binding In simulations of ligand-protein interactions, rigid body degrees of freedom and rotatable angles of the ligand are treated as independent variables. Ligand conformations and orientations are sampled in a parallelepiped that encompasses the binding site obtained from the crystallographic structure of the corresponding complex with a 5.0 ft. cushion added to every side of this box. Bonds allowed to rotate include those linking s p 3 hybridized atoms to either s p 3 or s p 2 hybridized atoms and single bonds linking two s p 2 hybridized atoms. The ligand bond lengths, bond angles, and the torsional angles of the unrotated bonds were obtained from the crystal structures of the bound wildtype ligand-protein complexes. Crystallographic buried water molecules are included in the simulations as part of the protein structure. We have pursued a 'plug-and-play' strategy with two different energy functions, a molecular mechanics AMBER force field [131,132] and a simplified energy function, along with two different sampling techniques, evolutionary programming [91] and Monte Carlo simulations [118,119,127,128]. The knowledge-based simplified energetic model includes intramolecular energy terms for the ligand, given by torsional and nonbonded contributions of the DREIDING force field [133], and intermolecular ligand-protein steric and hydrogen bond interaction terms calculated from a piecewise linear potential summed over all protein and ligand heavy atoms [19-21,91,92]. The parameters of the pairwise potential depend on the six different atom types: hydrogen-bond donor, hydrogen-bond acceptor, both donor and acceptor, carbon-sized nonpolar, flourine-sized nonpolar and sulfur-sized nonpolar. Primary and secondary amines are defined to be donors while oxygen and nitrogen atoms with no bound hydrogens are defined to be acceptors. Sulfur is modeled as being capable of making long-range, weak hydrogen bonds which allows for sulfur-donor closer contacts that are seen in some of
303
the crystal structures. Crystallographic water molecules and hydroxyl groups are defined to be both donor and acceptor, and carbon atoms are defined to be nonpolar. The steric and hydrogen bond-like potentials have the same functional form, with an additional three-body contribution to the hydrogen bond term. The parameters were refined to yield the experimental crystallographic structure of a set of ligand-protein complexes as the global energy minimum [91,92]. No assumptions regarding either favorable ligand conformations or any specific ligand-protein interactions were made, and all buried crystallographic water molecules are included in the simulations as part of the protein structure. The all atom-based energy function employed in this study contains an intramolecular term for the ligand, which consists of the van der Waals and torsional strain contributions of the DREIDING force field and an intermolecular energy term which describes interactions between the ligand and the protein. The short-ranged repulsive interactions present in many molecular force fields such as AMBER leads to rough energy surfaces with high energy barriers separating local minima. In this force field, small changes in position can lead to significant energy changes. For molecular docking simulations, it has been shown that the energy surface must be smooth for robust structure prediction of ligand-protein complexes [92]; softening the potentials is a way to smooth the force field and enhance sampling of the conformational space while retaining adequate description of the binding energy landscape [119,125,134] We have shown that both the modified AMBER force field and the simplified piecewise linear (PL) energy function produce comparable results during docking simulations in predicting crystal structures of ligand-protein complexes [119,125]. Both the modified AMBER energy function and the P L energy function do not have singularities at interatomic distances, effectively explore accessible ligand binding modes, and sample a large fraction of conformational space, particularly at high temperature. Although the standard AMBER force field is less amenable to searching, in principle it should describe more adequately the energetics of ligandprotein interactions, which is critical for adequate ordering of the energetics of SB203386 complexes with HIV-I protease and its mutants. In this study, we employ a hierarchical approach where the PL energy function is used in combination with parallel Monte Carlo simulated tempering approach [135140] to adequately sample the conformational space and describe the multitude of the inhibitor binding modes. The advantage of simulated tempering
304
approach is the ability not only to generate an accurate canonical distribution of the ligand-protein system at a wide temperature range, but also to search for the global energy minimum. The PL energy function is expected to characterize the density of low-energy states and describe the local basins surrounding binding modes. However, this function is less accurate in detecting the exact locat}on and energetics of the native state because of the inaccuracy in quantifying the exact magnitude of ligand-protein interactions. Standard molecular mechanics AMBER force field in conjunction with a desolvation correction [141] is used to optimize the generated samples from the low-energy regions and thereby characterize more precisely the energetics of the inhibitor binding domains. A solvation term was added to the AMBER interaction potential to account for the free energy of interactions between the explicitly modeled atoms of the ligand-protein system and the implicitly modeled solvent.
IV.1. Computer simulations of ligand-protein docking Evolutionary algorithm, a stochastic optimization technique based on the ideas of natural selection, was used in ligand-protein docking simulations [91]. During the search, a population of candidate ligand conformers competes for survival against a fixed number of opponents randomly selected from the remainder of the population. A win is assigned to the competitor with the lowest energy and the number of competitions that a member wins determines the survival probability to the next generation. All surviving members produce offspring, subject to a constant population size. In the population of ligand conformers, each member represents an encoded vector consisting of the rigid body coordinates and the torsional angles about the rotatable bonds. The initial ligand conformations are generated by randomizing the encoded vector, where the center of mass of the ligand is restricted to the rectangular parallelepiped that defines the active site. The three rigid-body rotational degrees of freedom, as well as the torsional angles for all rotatable bonds are uniformly initialized between 0 and 360 degrees. In simulations with multiple protein conformations, each member of the initial population represents a ligand conformation with a randomly assigned protein conformation from the given ensemble. During the search, the surviving members of the population with the lowest energy represent the ligand conformation with the corresponding protein conformation. The
305
protein conformation of the winner is preserved when offspring is produced, otherwise a new randomly selected protein conformation is assigned to a population member. For each docking simulation, the evolutionary search was performed for a total of 120 generations with a population size of 1200 members. To provide a necessary level of diversity, each member competes against three opponents at each generation. The size of the standard deviation for the Gaussian mutation in the process of generating offsprings is varied adaptively using selection pressure. As a result, large mutations are encouraged early in the simulation to facilitate rapid search, while smaller mutations are made as the simulation progresses to refine solutions near to the global energy minimum. The minimized best member of the final generation defines the predicted structure for the ligand-protein complex. Using the evolutionary searching algorithm, we have carried out multiple independent docking simulations of the SB203386 inhibitor with the ensemble of 6 protease bound conformations, generated from the crystallographically determined HIV-1 protease wild-type and mutant complexes with the SB203386 inhibitor [77,78,80,81] : 1) SB203386 wild-type (pdb entry lsbg), 2) HIV-1 (2:31-37)chimera (lbdl), 3) HIV-1 (2:31-37,47,82) (lbdq), 4) HIV-1 (2:31,33-37) (lbdr), 5) HIV-1 protease triple mutant V32I/I47V/V82I (ltcx), and 6) SIV protease (ltcw). In addition, an extended set of 32 protease bound conformations was used in docking simulations, that included protein conformations of the SB203386 complexes. The remainder of this set consisted of the following crystallographically determined HIV-1 protease complexes : 7) hydroxyethylene inhibitor (laaq), 8) $B203238 (lhbv), 9) SKF 108738 (lhef), 10)SKF107457 (lheg), 11) CGP 53820 (lhih), 12) U75875 (lhiv), 13) SB204144 (lhosa), 14)SB204144 (lhosa), 15)SB206343 (lhpsa), 16) SB206343 (lhpsb), 17) VX-478 (lhpv), 18) GR126045 (lhtfa), 19) GR126045
(lhtfb), 20) Cm37615 (lhtg ), 21) CR1376 5 (ltgb), 22) A7692S 23) A77003(R,S)(lhvi), 24)A78791(S,-)(lhvj), 25) A76928(S,S)(lhvk), 26) A76889(R,R) (lhvl), 27) XK263 (lhvr), 28) V82A mutant with inhibitor A77003 (lhvs), 29) MVT101 (4hvp), 30) JG365 (7hvp), 31) U85548e (Shvp), and 32) A-74704 (9hvp).
306
IV.2. Monte Carlo equilibrium simulations of ligandprotein thermodynamics Parallel simulated tempering dynamics with multiple protein conformations can be considered as a modification of A-dynamics approach [142-147] and primarily its extension t h a t rapidly evaluates the relative binding affinities of a set of ligands to a given protein [146,147]. This methodology is based on the idea of the "hybrid" hamiltonian that allows efficient calculation of thermodynamic quantities with a coupling parameter, treated as a dynamic variable, rather than a parameter for continuous transformation from one state to another. The A-dynamics approach was further developed for competitive binding calculations with ensembles of multiple ligands, where ligands compete for a given receptor on the basis of their relative binding free energies. Rapid screening of binding affinities with the A-dynamics method is a compromise between conventional free energy methods and empirical free energy methods. This methodology was found to be more efficient in evaluating multiple ligands because of the simultaneous search component of the technique [147]. Analogous to the )~-dynamics approach, binding free energy calculations of a given ligand with an ensemble of multiple protein conformations must contain two components: the free energy calculation of the solvated protein, and the free energy of the complexed ligand-enzyme bound state. The first half of this binding affinity equation evaluates the solvation free energy of protein conformations and should take into account the free energy changes between the unbound and closed forms of the HIV-1 protease and its mutants. We focus only on the second half of the binding affinity equation and analyze the results of competitive binding experiments, in which multiple protein conformations compete for the SB203386 inhibitor on the basis of the interaction energetics. In simulations with ensembles of multiple protein conformations, each ligand replica of the ligand-protein system is associated with a protein conformation from a given ensemble. The protein conformations are linearly assigned to each temperature level, that implies a consecutive assignment of protein conformations starting from the highest temperature level and allows each protein conformation from the ensemble at least once be assigned to a certain temperature level. We have carried out equilibrium simulations with the ensembles of protease conformations using parallel simulated tempering dynamics with 50 replicas of the ligand-protein system attributed respec-
307
tively to 50 different temperature levels that are uniformly distributed in the range between 5300K and 300K. Independent local Monte Carlo moves are performed independently for each replica at the corresponding temperature level, but after a simulation cycle is completed for all replicas, configuration exchanges for every pair of adjacent replicas are introduced. The m-th and n-th replicas, described by a common Hamiltonian H ( X ) , are associated with the inverse temperatures ~,~ and ~ , and the corresponding conformations Xm and X~. The exchange of conformations between adjacent replicas m and n is accepted or rejected according to Metropolis criterion with the probability p = rain(l, exp[-5]) where 5 = [~-/3m][H(Xm)-H(X~)]. Starting with the highest temperature, every pair of adjacent temperature configurations is tested for swapping until the final lowest value of temperature is reached. This process of swapping configurations is repeated 50 times after each simulation cycle for all replicas whereby the exchange of conformations presents an improved global update which increases thermalization of the system and overcomes slow dynamics at low temperatures on rough energy landscapes, thereby permitting regions with a small density of states to be sampled accurately. During simulation, each replica has a non-negligible probability of moving through the entire temperature range and the detailed balance is never violated which guarantee each replica of the system to be equilibrated in the canonical distribution with its own temperature [135-140]. Hence, we generate the canonical distribution of the ligand-protein system and the equilibrium distribution of protein conformations at each temperature. At equilibrium, the fraction of time that the ligand-protein system spends at a protein conformation % = i to time spent at a protein conformation A = j is determined by the Boltzm a n n distribution P(Ai = 1, Amr = O) P(Aj = 1, A~j = O) and provides a measure for ordering protein conformations according to their interaction free energies with the inhibitor. The protein conformations that deliver the lowest interaction energy for the inhibitor during equilibrium simulation would dominate the distribution with the highest probability. Monte-Carlo simulations allow to dynamically optimize the step sizes at each temperature by taking into account the inhomogeneity of the molecular
308
system [148]. We update the maximum step sizes using the acceptance ratio method every cycle of 1000 sweeps, and stored both the energy and the coordinates of the system at the end of each cycle. For all these simulations, we equilibrated the system for 1000 cycles (or one million sweeps), and collected data during 10,000 cycles (or ten million sweeps) resulting in 10,000 samples at each temperature. A sweep is defined as a single trial move for each degree of freedom of the system. A key parameter is the acceptance ratio which is the ratio of accepted conformations to the total number of trial conformations. At a given cycle of the simulation, each degree of freedom can change randomly throughout some prespecified range determined by the acceptance ratio obtained during the previous cycle. This range varies from one degree of freedom to another because of the complex nature of the energy landscape. At the end of each cycle, the maximum step size is updated and used during the next cycle. Simulations are arranged in cycles, and after a given cycle i, where the average acceptance ratio for each degree of freedom j is
ln[a(Pideat) + b] a}+: - a~ In [a(Pj) i + b]
(1)
where (Pideal) is the desired acceptance ratio, chosen to be 0.5. The parameters a and b are used to ensure that the step sizes remain well-behaved when the acceptance ratio approaches 0 or 1. They are assigned so that the ratio ai+:/a i is scaled up by a constant value s for (Pj)i = 0, and down by the same constant for ( p j ) i = 1. Solving the equations
s-: = ln[a(Pid~) + b]
(2)
ln[b] =
ln[a(P, dea,) + b] ln[a + b]
with s = 3 yields a = 0.673 and b - 0.065.
(3)
309
IV.3. Monte-Carlo data analysis with the weighted histogram method The energy landscape approach can elucidate such general properties of molecular recognition as the nature of the thermodynamic phases and barriers on the ligand-protein association pathway [127,128]. This method evaluates equilibrium thermodynamic properties of the system from Monte Carlo simulations of the system at a broad temperature range with the aid of the optimized data analysis and the weighted histogram analysis technique [148153]. Monte Carlo simulations can be used to calculate equilibrium averages of any quantity of interest, but in general computing these averages at different temperatures requires independent simulations at each temperature. With the single histogram method thermodynamic properties can be calculated at temperatures other than the simulation temperature provided that there is accurate sampling of the density of states in the relevant range of energies [148,149,154-156]. In practice, this requirement limits the applicability of the single histogram method to temperatures near the simulation temperature. The multiple histogram method [150,151] optimally combines simulation data obtained at many discrete temperatures to provide an improved estimate of the density of states, which can then be used over a range of continuous temperatures. A generalization of the multiple histogram method, the weighted histogram analysis method (WHAM), estimates the density of states from data collected using umbrella sampling [151-153]. All of these histogram methods have been applied to simulations of biomolecules. In lattice models of protein folding, histograms have been used to calculate the native state probability density as a function of temperature [154], as well as the potential of mean force (PMF) as a function of the number of native contacts [155,156]. Histograms have also been used to compute the PMF for both one and multi-dimensional reaction coordinates at constant temperature [152,153,157]. While alternate methods such as free energy perturbation and various weighting schemes are sufficient to compute onedimensional P MFs, WHAM has been shown to be preferable for computing two-dimensional PMFs [158]. Consider N simulations carried at different temperatures with the nth simulation being performed at temperature/~n and the density of the energies being Wn(E). We write the probability distribution as the follows :
310
p,~(E) = H,~(E)/N,., = W,~(E)exp(-~,~En + A)
(4)
exp(f~) - exp(fl~An) = Z(fl) -~
(5)
where
The objective of the weighted histogram method is to obtain the best estimate of the density of states W(E) at each temperature. This estimate can be written as a weighted sum of the N estimates Wn (E) ( n = 1,..., N)
W(E)=~_,p,(E)Wn(E)
(6)
n
with the normalization condition ~nP~ = 1. Importantly, the weights Pn (E) depend only on E, so a certain confidence level can be incorporated for different simulations that are performed at different temperatures. In this work, we apply the weighted histogram analysis method to compute ligand-protein binding energy landscapes, F(R,T), as a continuous function of temperature and reaction coordinate. They are determined by first tabulating two-dimensional histograms Hi(E, R) from the various constanttemperature equilibrium simulations i, and then solving the self-consistent multiple histogram equations [150] to yield the density of states
W(E, R) -
zM1 g,-IH,(E, R) M gj_lnj e x p [ - ( E - Fj)/kBTj] '
(7)
where
exp[-Fj/ksTj] = ~ W(E) exp[-E/ksTj],
(8)
E
and
W(E) - ~ W(E,R). R
9j depends on the correlation time ~-j as 9j - 1 + 2-rj and nj is the number of samples at the temperature Tj.
311
Although these equations are expressions for the density of states as a function of both energy and reaction coordinate, the free energies are identical to those obtained from the standard one-dimensional multiple histogram equation.
W(E) - E W(E, R) -
R
M F-,i:l 9i-lHi(E) ~j:l gj-lnj e x p [ - ( E - Fj)/kBTj]
(9)
where
Hi(E) - ~ Hi(E,R),
(10)
R
and Hi(E) is the standard one-dimensional histogram as a function of energy. These equations are precisely the self-consistent equations for the free energies in the one-dimensional multiple histogram equations. Hence, the one-dimensional equations can be used to determine the free energies Fj, and then to compute the multi-dimensional density of states W(E, R). In this way, calculating the multi-dimensional density of states as a function of E and R requires no additional computational effort beyond tabulating the simulation data as a function of reaction coordinate as well as energy; the only difficulty is that more sampling is required to ensure adequate statistics. From the probability density W(E, R), the potential of mean force F(R, T) at arbitrary temperature relative to a reference position R~ can be computed from the probability density P(R, T) as
F(R, T) = -kBT ln[P(R, T)/P(Rc, T)],
(II)
P(R, T) - E P (E, R)
(12)
where
E
PT(E, R) = W(E, R) exp[-E/ksT].
(13)
We define R to be the root mean square deviation (RMSD) of the ligand coordinates from the native state, and the native state is chosen to be the reference state, so Rc - 0.0.
312
V. Computer simulations of HIV-1 proteaseinhibitor binding dynamics and thermodynamics The crystal structures of the HIV-1 protease, HIV-2 protease, SIV-protease and their various mutant forms in complexes with a diverse repertoire of structurally different inhibitors, including among others SB203386 [78,80,81], U75875 [159,160], SKF107457 [161,162], A77003 [163,164], U89360E [165], have provided a collection of the HIV-1 protease conformational substates which can be used to analyze structural basis of the inhibitor affinity and specificity to the resistant mutants. A molecular mechanism of inhibitor resistance has been recently suggested whereby the synthetic inhibitors with fewer degrees of freedom and low conformational entropy may be unable to adapt to backbone rearrangements or distorted binding sites, induced by mutations in the protease and maintain favorable interaction energetics. Increasing the size and flexibility of the inhibitor may allow to conform to protease flexibility caused by mutations in the active site and form new favorable interactions without a significant detrimental effect on binding affinity [58]. Although general contributing forces and interactions in ligand-protein binding are well established, a structure-based thermodynamic analysis coupled with accurate representation of the interaction energetics are required to verify this hypothesis and to understand the molecular origins for inhibitor resistance to HIV-1 protease mutants. Computational analysis of HIV-1 protease has provided significant insights into problems of drug resistance to HIV-1 protease mutant forms [166,167] and has elucidated the nature of enzyme-inhibitor interactions [168-173]. In a molecular mechanics study [171] the complexes of saquinavir and indinavir inhibitors with the complexes of saquinavir and indinavir inhibitors with the RSQ, V32I, M46I, V82A, V82I, V82F, I84V, V32V/I84V and M46I/I84V HIV-1 protease mutants were studied. The calculated interaction energies have shown a significant correlation with free energy differences except for the RSQ mutant interacting with indinavir. Another computational approach has been presented for computation of relative binding free energies of HIV-1 protease-inhibitor complexes in solution [172]. This method combines semiempirical quantum calculations to determine protonation states of the HIV-1 protease with molecular mechanics to determine the
313
gas-phase energetic contribution and dielectric continuum solvation model to calculate electrostatic hydration free energies [172]. The relative binding free energies of saquinavir, indinavir, KNI272 and A77003 inhibitors with the HIV-1 protease and its I84V mutant have shown that the changes in binding affinity of saquinavir are due to the enthalpy interactions between the ligand and the enzyme, but can be also influenced by the hydration free energy contributions of the enzyme and the complex as observed for indinavir and A77003 inhibitors. Subsequently, molecular dynamics calculations have been carried out for the same set of inhibitors with the HIV-1 protease and its I84V mutant to incorporate sampling of both the ligand and protein conformations and more accurately evaluate binding free energies [173]. As a result, the binding free energy changes correlated with the size of the cavities induced by mutation. Structural effects in the SB203386 complex with the HIV-1 protease triple mutant HIV-1 (2: 32,47,82) were studied by a hierarchical computational approach that involves a hierarchy of energy functions and optimization strategies [107]. Monte Carlo simulated annealing technique with the simplified energy function was employed in this approach for flexible ligand docking in conjunction with the DEE algorithm utilizes the AMBER/OPLS force field for side-chain optimization of the protein active site residues. Each of the docked ligand conformations was used to generate the templates for a subsequent step of protein side-chain optimization with the DEE procedure. Energy evaluation and minimization of the generated DEE solutions were performed at the final stage of the protocol. The ligand-protein complex with the lowest energy, measured in the AMBER force field [131,132] with the generalized Born and solvent-accessible surface area (GB/SA) solvation model [174], determined the predicted structure the ligand-protein complex. We have observed a phenomenon of a decrease in the energy of the complex when the unsymmetrical SB203386 ligand (Fig. 1) approaches RMSD = 9 A-10 A from the crystal structure, that corresponds to a ligand binding mode which is the mirror image of the native conformation. This observation has been rationalized not as a flaw of the method, but as a reflection of the dimeric nature of HIV-1 protease complexes. The crystallographic binding mode corresponded to the complex with the lowest energy, being also distinguished from the 'mirrored' alternative by its location in a wider energy funnel. This suggested that binding energy landscapes of the SB203386 inhibitor interacting with the HIV-1 protease and the triple mutant have
314
Figure 1: The chemical structure of SB203386.
Figure 2: The crystal structure of SB203386 bound to HIV-1 protease. The Connolly surface of the A chain monomer is shown.
co-existing correlated, funnel-like and uncorrelated, rugged features. Flexible ligand docking simulations of the SB203386 inhibitor were originally performed with a single wild-type HIV-1 protease conformation using the standard AMBER force field, the modified, soft-core AMBER force field in combination with a solvation correction, and the simplified energy function. In contrast to the standard AMBER energy function, both the modified AMBER force field and the simplified energy function are effective in exploring the multitude of the inhibitor binding modes. The docking simulations have yielded 40% frequency of predicting the native binding mode within 2.0/l~ RMSD from the crystal structure (Fig. 2), as well as a considerable population of the symmetry-related binding mode at RMSD -- 9 .~-10 from the crystal structure, and a small fraction of misdocked local minima at RMSD = 4.0-6.0/l~ from the crystal structure [125]. Simulations with a single rigid protein conformation, even with flexible protein side-chains, still ignore an important dynamic aspect of protein flexibility in the binding process. While local conformational perturbations of different protein side-chains can be simulated by current docking approaches, large scale motions between different protein conformational substates that represent global conformational fluctuations, such as those revealed by X-ray crystallographic studies,
315
are difficult to sample. The crystal structures of HIV-1 protease complexes manifest the same general topological pattern of the protein structure in its bound conformation, but exhibit substantial protein backbone and side-chain flexibility [125]. Using the simplified energy function and preferential biased sampling of low-energy protein conformations, we have performed multiple docking simulations of SB203386 to the ensemble of 10 HIV-1 protease conformations obtained from the crystal structures with different HIV-1 protease inhibitors [125]. To understand the relationship between the multitude of the inhibitor binding modes and the protein conformational substates, we have monitored both the frequency distribution of predicted ligand-protein binding modes and associated distribution of protein conformations. The native SB203386 binding mode was predicted exclusively with its corresponding crystal protein conformation, and the symmetry-related binding mode was found primarily with the HIV-1 protease conformation from the ensemble, corresponding to the complex with the SB206343 inhibitor [125]. However, the meta-stable binding domain 4.0-6.0 .~ RMSD from the crystal structure was found with many different protein conformations. Hence, the two dominant, symmetry-related ligand binding modes for the SB203386 ligand were found to be protein-specific, while the misdocked local minima at 4.0-6.0 .~ RMSD from the crystal structure can be associated with any of the protein conformations from the ensemble. Previously, we have found that inhibitors with similar number of internal degrees of freedom as SB203386 (13 rotatable bonds) and SKF107457 (14 rotatable bonds) can have very different free energy profiles in simulations with ensembles of multiple protein conformations. While the SKF107457 native binding mode is found with various protein conformations, the native binding mode of the SB203386 ligand is considerably less tolerant to protein fluctuations. We have shown that the relative tolerance of SKF107457 to different protein substates may arise because the binding funnels are broad, while those in SB203386 are narrower [125]. In this work, we present Monte Carlo simulations of HIV-1 proteaseSB203386 binding thermodynamics and dynamics with ensembles of protein conformations that allow to implicitly incorporate protein flexibility and to assess the role and contribution of the enzyme-inhibitor interactions in molecular mechanisms for the SB203386 inhibitor selectivity. We began with docking simulations of the SB203386 inhibitor to the ensemble of 6 protein conformations, utilizing the P L energy function. We have found two pro-
316
nounced peaks, corresponding to the native binding domain, at 1.0-2.0 .~ RMSD from the crystal structure, and symmetry-related binding mode at RMSD = 8.0-10.0/~ from the native state, which dominate the distribution (Fig. 3a). The probability of predicting the crystallographic binding mode within 2.0/~ RMSD with the PL energy function is 40%, (Fig. 3a) which is similar to the results with a single crystal protein conformation [125]. The distribution reflects a number of misdocked low-energy solutions, scattered between RMSD = 2/~ and 8 ~ from the crystal structure and contributing less than 30 % to the distribution. The frequency of predicting the crystallographic binding mode decreases from 40% to 20% in simulations with the ensemble of 32 protein conformations, but there are still two pronounced peaks corresponding to the native structure and symmetry-related binding mode at RMSD = 9-10/~ from the crystal structure (Fig. 3b). The results have revealed a significant overlap between the energies of the native binding domain and symmetry-related binding mode (Fig. 3c,3d). However, the misdocked binding modes, located between 2 /~ and 8 ~ RMSD from the native state, have higher energies and can be distinguished from the native structure (Fig. 3c,3d). We find that there is a consistent decrease in energy in the vicinity of the crystallographic binding mode, i.e. the closer the ligand conforms to the crystal structure in the native binding domain the lower the energy. These results suggest that the low-energy inhibitor binding modes are connected on the PL energy landscape by funnels of conformations that lead to the crystal structure via moderate energy barriers (Fig. 3c,3d). In contrary, there are four different binding modes, obtained in docking simulations to the ensemble of 6 protein conformations with the AMBER energy function with two additional minima that reside at RMSD = 3 ]~ and RMSD = 5 ~ from the crystal structure (Fig. 4a,4b). We find that the predicted bound conformations from the native binding domain are not only weakly connected with other minima due to high energy barriers (Fig. 4b), but also correspond to significantly higher energy states (Fig. 4c) than the energies of the minimized SB203386 crystal structures (Fig. 4d). Docking simulations with the standard AMBER force field have resulted in a rather limited sampling due to high energy barriers and sensitivity of the energy function to the precise geometry of ligand-protein binding modes. Conformations that slightly deviate from the exact crystal structure have high energy values and the energy of the crystal structure, after minimization with the AMBER force field (Fig. 4d), is significantly lower than the energy of
317
100
10
a)
80
~J
..~,.~,,,~,..~2.~
~6
60 ~J
40 20 0
O
2
4 rmsd
0
6 8 10 12 14 (Angstroms)
100
200
400 600 800 1000 energy rank
10
b)
80
8
60
~6
40
~4
~J ~J
20
2 I-i /
0
O
2
4 rmsd
6 8 10 12 14 (Angstroms)
.
*
~
o F ' " ~
0
o - - , o , ".,.--,-oO't
,d, ~,~,!~,.,..~e~,~.~l~'O.~__'t~4~',~"~"'~.~ ~ *1 lll~'~4~m'~.'~ 4,,' ' ~ ' # ~ ' * /
200
~ ""
"
"
"
400 600 800 energy rank
/
1000
Figure 3: The frequency of predicting the crystal structure of the SB203386HIV-1 protease complex in docking simulations with the ensemble of 6 protein conformations (a) and the ensemble of 32 protein conformations (b). The RMSD of the docked inhibitor conformations from the crystal structure ranked by energy in simulations with the ensemble of 6 protein conformations (c) and the ensemble of 32 protein conformations (d). The piecewise linear energy function is employed.
318 1 O0
80
10
a)
8 o
~6
60 40 20 0
[
.,
0
2
4
6
8
10
12
rmsd (Angstrom)
.
200
14
100
-
400
energy
600
_o..1 --
rank
800
..
J
1000
O
c) 8O 60
~
L., 4 0
-10
-15
20 t
0 --20
30
80
energy
1 30
180
-20
0
1
2
3
4
5
protein conformation
6
7
Figure 4: The frequency of predicting the crystal structure of the SB203386HIV-1 protease complex in docking simulations with the ensemble of 6 protein conformations (a). The RMSD of the docked inhibitor conformations from the crystal structure ranked by energy in simulations with the ensemble of 6 protein conformations (b) The energy distribution histogram of the docked conformations (c). The filled histogram (d) reflects the energies of the rainimized crystal structures for 1) lsbg, 2) lbdl, 3) lbdq), 4) lbdr, 5) 1rex, 6) ltcw protein structures. The standard AMBER energy function is used.
319
the lowest energy conformation found in docking simulations with AMBER (Fig. 4c). Hence, simulations with AMBER result in a soft failure in docking, when the search algorithm is unable to find the global energy minimum on a rugged binding energy landscape. To highlight the relationship between the results of kinetic docking simulations with ensembles of protein conformations and the effects of protein flexibility, we monitored the frequency of different protein conformational substates in the predicted structure. We analyze the relationship between the multitude of the ligand binding modes and protein conformational fluctuations. In particular, it is investigated whether the native binding mode can tolerate multiple protein conformations from the ensemble of SB203386 complexes. The frequency distribution, obtained with the PL energy function in simulations with the ensembles of multiple protein conformations, is dominated by lsbg and l bdr protein conformations (Fig. 5a). According to our hypothesis, the PL energy function detects the higher density of low-energy states and can describe the shape of the basins near the dominant binding domains. The results indicate that complexes with lsbg and l bdr protein conformations may have broader basins of low-energy inhibitor-protein states near the native structure. However, lhpsa and lhpsb protein conformations, that contribute more significantly to the distribution, primarily stabilize the symmetry-related binding mode of the inhibitor (Fig. 5b). Hence, the native binding mode of the SB203386 inhibitor is associated only with a small number of specific crystal protein conformations and are not tolerant to a broader ensemble of multiple protein conformations. Structure-based thermodynamic analysis of the inhibitor resistance to HIV-1 protease mutants has suggested that the molecular origins of this phenomenon reside in the thermodynamic partitioning of the interaction energetics rather than in the identity of the protease residues that interact with the inhibitors [39,40]. Accurate evaluation of equilibrium thermodynamic properties requires adequate sampling of both ligand and protein conformations. A limited sampling of low-energy states in docking simulations with AMBER and high sensitivity of the energy function to the precise geometry of the binding modes indicate that it would be unlikely to achieve adequate sampling of the low-energy binding basins in equilibrium simulations with the molecular mechanics force field. We assume that the PL energy function follows the 'true' potential and can detect the density of low-energy states in the broad regions surrounding favorable binding modes. This assumption
320
1000 900 800 7OO r 6OO 5O0 E 4OO 3OO 200
a)
1 O0 00
000 900 800 700 600 500 400 300 200 1 O0 0
3
6
protein conformation
-7
b)
2
6
_ I - l _ ~EI
1 0 1 4 18 2 2 2 6 3 0
protein conformation
Figure 5: The frequency of protein conformations for the SB203386-HIV-1 protease complex in docking simulations with the ensemble of 6 protein conformations (a) and the ensemble of 32 protein conformations (b). The piecewise energy function is used. The unfilled histogram is the total frequency for each conformation, the filled histogram is the frequency for ligand conformations that are within 2.0 A RMSD of the crystal structure.
321
implies that the breadth of the local minima basins results from long-range character of hydrophobic interactions and should be recognizable by using the simplified knowledge-based energy function [126]. The equilibrium samples from the low-energy basins generated with the PL energy function can reproduce the entropic component of the interaction free energies. While the PL energy function is proven to be more adequate for sampling non-polar and hydrogen bonds patterns, this simplified energy model does not include a direct electrostatic component and therefore may be less accurate in detecting the exact energetics of the binding modes, especially when extensive networks of electrostatic interactions are present in the crystal structure. Consequently, we have applied a hierarchy of energy functions to utilize the robustness of the PL energy function in characterizing the multitude of the available low-energy binding modes and the topology of the binding energy landscape along with accuracy of the AMBER energy function in quantifying the exact magnitude of ligand-protein interactions [129,130]. According to our hypothesis that the simplified energy function tracks the low-energy basins of the true potential, the equilibrium samples, generated with the PL energy function, can be mapped by direct AMBER minimization onto the corresponding low-energy conformations that can more adequately characterize the energetics of the binding modes, but are otherwise unreachable in direct sampling of the AMBER binding energy landscape. The resulting energy values are best thought of as the enthalpy contributions to the interaction free energy. We perform equilibrium simulations with the PL energy function and after each cycle the generated at lower temperatures conformations are minimized with each protein conformation from the ensemle using the complete AMBER force field. To quantify the proposed rationale for the inhibitor resistance and characterize adaptability of the SB203386 inhibitor to the HIV-1 protease mutant conformations, we investigate binding thermodynamics of SB203386 by generating binding free energy landscapes with ensembles of protein conformations, thereby implicitly incorporating protein flexibility effects in clarifying role of the enzyme-inhibitor interactions for the inhibitor resistance. The binding free energy profiles generated from equilibrium simulations with the ensembles of multiple protein conformations provide insights into the relationship between various tiers of protein conformational fluctuations and the corresponding inhibitor response. We analyze the binding free energy landscapes by monitoring the temperature-dependent equilibrium distributions
322
of the inhibitor local minima and corresponding protein conformations as well as changes in the density of low-energy basins that surround the inhibitor binding modes. In the energy landscape model, conformational substates that represent local minima of the complex are organized in a hierarchy of the inhibitor binding modes and corresponding families of protein conformational fluctuations. We find, that the overall shape of the binding energy landscapes generated in simulations with two ensembles of protein conformations is similar and the native and symmetry-related binding mode remain dominant energy minima throughout the entire temperature range (Fig. 6a,6b). Conformational fluctuations that can be described by an extended ensemble of proteins trigger only moderate changes in the relative stability of the inhibitor binding modes. The free energy of the symmetry-related binding modes becomes lowered relative to the native structure on this binding energy landscape (Fig. 6b). By monitoring temperature-dependent equilibrium distribution of inhibitor local minima, we have found that these binding domains have nearly equal free energy at higher temperature, and the native structure is favored at lower temperature in simulations with the ensemble of 6 protein conformations (Fig. 6a). The binding domains at 4.0 A and 8.0 ~ RMSD from the crystal structure are meta-stable and become shallow at low temperature. The results show that mobility of the SB203386 inhibitor is effectively restricted to two symmetry-related binding modes, even at high temperatures, and prevents the inhibitor from adapting to distorted binding sites, induced in mutants conformations. The limited repertoire of the energetically favorable binding modes eliminates lbdl and ltcw protein conformations, that have significantly lower affinity and a different mode of binding. The distribution of protein conformations at low temperatures in simulations with the ensemble of 6 proteins is dominated by lsbg and l bdr conformations which stabilize the native binding mode (Fig. 7a,c), while the alternative binding mode is energetically favorable with lhpsa and lhpsb conformations in simulations with the extended ensemble (Fig. 7b,d). The results of equilibrium simulations indicate that enzyme-inhibitor interactions are entropically favorable for lsbg and l bdr protein conformations that have broader basins of low-energy states near the native structure than other protein conformations from the ensemble. It is also evident, that the breadth of the low-energy basins, that reflects the entropy component of the interaction energetics, is insufficient to reproduce differences in binding affinity for SB203386 complexes with HIV-1 protease mutants.
323
a)
9(~; ~176176 \ ~
.
5000
0.0
rmsd (Angstroms)
b)
9~ ; ~176176 \ ~ 50O0
~
-
~
0.0
rmsd (Angstroms)
Figure 6: The binding free energy landscape for SB203386 generated with the piecewise linear energy function and the ensembles of 6 protein conformations (a) and 32 protein conformations (b). For each two-dimensional temperature slice, the reference energy F(R = 0, T) is defined to be zero.
324 10
10
t/lll lllilll
8
.8
f
~6~
~
[I Hill 60'00 8o'oo MC cycle
10000
1 0000
4Oooo 6oo----' --5 80---~5 10000 MC cycle 6000
8000 4000
6000 ~. 4 0 0 0 r.T..
2000
2000 0
0
1
2
~ 3
4
,--, 5
6
C)
protein conformation
.~1 2
6
_
d)
10 14 18 22 26 30
protein conformation
Figure 7: The time-dependent history of the SB203386-HIV-1 protease system as a function of rmsd of the current ligand conformation relative to the crystal structure versus Monte Carlo cycle at 300K during equilibrium simulations with the ensemble of 6 protein conformations (a) and the ensemble of 32 protein conformations (b). The frequency of protein conformations for the SB203386-HIV-1 protease complex at T=300K in equilibrium simulations with the ensemble of 6 protein conformations (c) and the ensemble of 32 protein conformations (d). The piecewise energy function is used. The unfilled histogram is the total frequency for each conformation, the filled histogram is the frequency for ligand conformations that are within 2.0 .~ RMSD of the crystal structure.
325
The enthalpy component of the binding modes energetics can be characterized by the distribution of low-energy protein conformations that results from minimization of the equilibrium samples, generated with the PL energy function (Fig. 8a,b). These results appear to be more adequate in reproducing the experimental trend in binding affinities of the SB203386 complexes with HIV-1 protease mutants (Fig. 8) where the enthalpy contribution ranks lsbg, ltcx, lbdq and lbdr protein conformations respectively according to their contribution to the probability distribution. Despite seeing little or no sampling for less favored lbdl and ltcw protein conformations, we observe the correct general trend in the average interaction energetics. The energy profile of the minimized conformations with each protein conformation demonstrates that there are only small variations in the energetics of the native basin for lsbg and ltcx complexes, while more significant energy variations are observed for l bdq and l bdr complexes (Fig. 8b). The favorable energetics of lsbg and l tcx complexes is primarily determined by the enthalpy contribution of the interaction energy and is the primary reason for the crystal structures energies to correlate fairly well the experimental data. Mutations in the l bdr and l bdq complexes result in a moderate decrease in binding affinity due to favorable interaction energetics, determined by a combined effect of entropy and enthalpy components. A moderate decrease in affinity for l bdr chimera can be explained as a combination of changes in 30's loop stability and the loss on favorable interaction energy with the inhibitor. Compensatory changes in the active site of l bdq chimeric complex leads to the interaction energy that are on average more favorable than in l bdr complex. We have determined that the compensatory changes in residues 47 and 82 for l bdq complex result in restoration of the interaction energy and compensate in part the loss of favorable contacts. As a result, the experimentally observed loss in affinity for lbdq complex could mainly result from structural flexibility in 30's and 80's loop and its effect on the dimer stability, and be less effected by the reduction in favorable enzymeinhibitor interactions. Apparently, the reduction in dimer stability is more detrimental for lbdq complex than for lbdr complex thereby offsetting the gain in the inhibitor-protease interaction energy. A similar compensatory effect of residues 47 and 82 has been also observed in the comparative inhibition of the mutants HIV-1 (2:32) and HIV-1 (2:32,47,82) with SB203386, which exhibit 16- and 6-fold increases in ther Ki values relative to that of the wild-type enzyme [80]. However, the dramatic loss in affinity for l bdl
326
10000
a)
8000 ~"6000 4000 2000 0
0
/ 1
/,/
2
3
4
5
6
protein conformation
7
0 --5 ~--10
--15
-2~'oo
' eooo
MC
' 8o0o cycle
10000
Figure 8: After each cycle of equilibrium simulations at 300K with the PL energy function, the generated SB203386 conformations are minimized using the AMBER force field with each protein conformation from the ensemble of 6 proteins. The protein conformation that delivers the lowest AMBER energy of the ligand-protein system is counted after ech cycle and contributes to the resulting distribution of protein conformations (a). The profile of the AMBER minimized energies for the SB203386-HIV-1 protease complex with the protein conformations lsbg (black), ltcx (yellow), lbdl (red), lbdq (blue), lbdr (orange), ltcw (maroon) at the simulation window between 4,000 and 10,000 cycles (b).
327
chimera complex is due to a significant loss of the favorable interaction energy caused by structural changes in the vicinity of mutated Iie32 residue ; and the changes in the dimer stability may play a secondary role in the observed binding affinity loss.
VI. C o n c l u s i o n s Computer simulations of HIV-1 protease-SB203386 binding dynamics and thermodynamics with ensembles of protein conformations allow to implicitly incorporate protein flexibility and complement a structure-based experimental analysis of the inhibitor resistance to HIV-1 protease mutants in rationalizing the molecular origins of this phenomenon based on the thermodynamic partitioning of the interaction energetics. In order to represent alternate protein conformational substates, we have introduced simulations with multiple protein conformations that accounts for protein flexibility by considering a finite number of protein states [175], which have significant differences in both side-chain and main-chain conformation. We have observed that the dominant SB203386 inhibitor binding modes are invariant in docking and equilibrium simulations with both ensembles of protein conformations. The native binding mode of the SB203386 inhibitor and the protein conformation are highly correlated, suggesting that this ligand is sensitive to the particular protein conformation into which it binds. We have adapted a hierarchy of simplified and detailed energy functions to describe the topology of the binding energy landscape and adequately characterize the energetics of the binding modes. We have shown that the binding energy landscape approach based on simulations with ensembles of multiple proteins provides insights into the relationship between distribution of the inhibitor binding modes and corresponding families of protein conformational fluctuations. Although the total values of binding energies are known to be determined mainly by hydrophobic effect and entropy contributions, the formation of favorable enzyme-inhibitor interactions are critical not only for predicting binding affinity differences for structurally similar ligands [168], but also for changes in binding affinity of a selective HIV-1 protease inhibitor with HIV-1 protease mutants. Thermodynamic analysis of SB203386 binding with HIV-1 protease mutants shows that the molecular origins of the inhibitor resistance can be in part rationalized based on
328
the entropy and enthalpy contributions of the interaction energetics. The results support the hypothesis that for protease-specific SB203386 inhibitor the enthalpy component of the interaction energy contributes to the changes in binding affinity and a limited inhibitor ability to adjust to mutations in the active site results in the loss of favorable interactions. We have found that the molecular mechanisms for inhibitor resistance may be determined primarily by the distribution of the inhibitor binding modes and the topology of the underlying energy landscape and may be less susceptible to the internal flexibility of the inhibitor. Design of low conformational entropy rigid ligands with the native-like binding mode that is more tolerant to protein fluctuations could lead to high potency inhibitors which are less susceptible to structural changes caused by mutations. The binding energy landscape analysis suggests a thermodynamic mechanism for regulating SB203386 inhibitor specificity, whereby ligand binding induces subtle changes in the relative energetic stability of the ligand binding modes, rather than significant perturbations in the overall shape of the energy landscape.
References [1] I.D. Kuntz, Science, 257 (1992) 1078. [2] T.P. Straatsma and J.A. McCammon, Annu. Rev. Phys. Chem., 43 (1992) 407. [3] B.K. Shoichet, R.M. Stroud, D.V. Santi, I.D. Kuntz and K.M. Perry, Science, 259 (1993) 1445. [4] J. Cherfils and J. Janin, Curr. Opin. Struct. Biol., 3 (1993) 265. [5] P. Kollman, Chem. Rev., 93 (1993) 2395. [6] I.D. Kuntz, E.C. Meng and B.K. Shoichet, Acc. Chem. Res., 27 (1994) 117. [71 Ajay and M.A. Murcko, J. Med. Chem., 38 (1995) 4953. [8] T.P. Lybrand Curr. Opin. Struct. Biol., 5 (1995) 224.
329
[9] R. Rosenfeld, S. Vajda and C. DeLisi, Annu. Rev. Biophys. Biomol. Struct., 24 (1995) 677. [10] G. Jones and P. Willett, Curr. Opin. Biotechnol., 6 (1995) 652. [11] D.A. Gschwend, A.C. Good and I.D. Kuntz, J. Mol. Recognit., 9 (1996) 175. [12] B.K. Shoichet, A.R. Leach and I.D. Kuntz, Proteins: Struct. Funct. Genet., 34 (1999) 4. [13] H. Frauenfelder and P.G. Wolynes, Physics Today, 47 (1994) 58. [14] J.D. Bryngelson, J.N. Onuchic, N.D. Socci and P.G. Wolynes, Proteins: Struct. Funct. Genet., 21 (1995) 167. [15] K.A. Dill, S. Bromberg, K. Yue, K.M. Fiebig, D.P. Yee, P.D. Thomas and H.S. Chan, Protein Sci., 4 (1995) 561. [16] K.A. Dill and H.S. Chan, Nature Struct. Biol., 4 (1997) 10. [17] E.I. Shakhnovich, Curr. Opin. Struct. Biol., 7 (1997) 29. [18] J. Janin, Proteins: Struct. Funct. and Genet., 25 (1996) 438. [19] G.M. Verkhivker and P.A. Rejto, Proc. Natl. Acad. Sci. USA., 93 (1996) 60. [20] P.A. Rejto and G.M. Verkhivker, Proc. Natl. Acad. Sci. USA., 93 (1996) 8945. [21] G.M. Verkhivker and P.A. Rejto, Proteins: Struct. Funct. Genet., 28 (1997) 313. [22] C-J. Tsai, D. Xu and R. Nussinov, Curr. Biol., 3 (1998) R71. [23] C-J. Tsai, S. Kumar, B. Ma and R. Nussinov, Protein Sci., 8 (1999) 1181. [24] R. Elber and M. Karplus, Science, 235 (1987) 318. [25] T. Noguti and N. Go, Proteins: Struct. Funct. and Genet., 5 (1989) 97.
330
[26] N. Go and T. Noguti, Chem. Scripta, 29A (1989) 151.
[27]
H. Frauenfelder, Nature Struct. Biol., 2 (1995) 821.
[28] D.T. Leeson and D.A. Wiersma, Nature Struct. Biol., 2 (1995) 848. [29] H. Frauenfelder and D.T. Leeson, Nature Struct. Biol., 5 (1998) 757. [30] P.A. Rejto and S.T. Freer, Prog. Biophys. Molec. Biol., 66 (1996) 167. A. Wlodawer and J.W. Erickson, Annu. Rev. Biochem., 62 (1993) 543.
[32] S.S. Abdel-Meguid, Med. Res. Rev., 13 (1993) 731. [33] K. Appelt, Perspect. Drug Disc. Design, 1 (1993) 23. [34] M.J. Todd, N. Semo and E. Freire, J. Mol. Biol., 283 (1998) 475. [35] J.A. D'Aquino, J. Gomez, V.J. Hilser, K.H. Lee, L.M. Amzel and E. Freire, Proteins: Struct. Funct. and Genet. 25 (1996) 143. [36] J. Gomez and E. Freire, J. Mol. Biol. 252 (1995) 337. [37] J. Gomez, V.J. Hilser, D. Xie and E. Freire, Proteins: Struct. Funct. and Genet., 22 (1995) 404. [38] V.J. Hilser, J. Gomez and E. Freire, Proteins: Struct. Funct. and Genet.,
26 (1996) 123. [39] I. Luque, O. Mavorga and E. Freire, Biochemistry, 35 (1996) 13681. [40] J.S. Bardi, I. Luque and E. Freire, Biochemistry, 36 (1997) 6588. [41] I. Luque, M.J. Todd, J. Gomez, N. Semo and E. Freire Biochemistry, 37
(1998) 5791. [42] T. Creighton (ed.), Protein Folding, W.H. Freeman and Company, New York, 1992. [43] G.I. Makhatadze and P.L. Privalov, Biophys. Chem. 51 (1994) 291. [44] G.I. Makhatadze and P.L. Privalov, Adv. Prot. Chem. 47 (1995) 307.
331
[45] J. Janin, Proteins: Struct. Funct. and Genet., 21 (1995) 30. [46] G.I. Makhatadze and P.L. Privalov, J. Mol. Biol., 232 (1993) 639. [47] P.L. Privalov and G.I. Makhatadze, J. Mol. Biol., 232 (1993) 660. [48] D. Xie and E. Freire, Proteins: Struct. Funct. and Genet., 19 (1994) 291. [49] D. Xie and E. Freire, J. Mol. Biol. 242 (1994) 62. [50] D. Xie, R. Fox and E. Freire, Protein Sci., 3 (1994) 2175. [51] V.J. Hilser and E. Freire, J. Mol. Biol. 262 (1996) 756. [52] V.J. gilser and E. Freire, Proteins: Struct. Funct. and Genet., 27 (1997) 171. [53] K.H. Lee, D. Xie, E. Freire and M. Amzel, Proteins: Struct. Funct. and Genet., 20 (1994) 68. [54] V.J. Hilser, D.Dowdy, T.G. Gas and E. Freire, Proc. Natl. Acad. Sci. USA, 95 (1998) 9903. [55] M.J. Todd and E. Freire, Proteins: Struct. Funct. and Genet., 36 (1999) 147. [56] E. Freire, Proc. Natl. Acad. Sci. USA, 96 (1999) 10118. [57] A.M. Borman, S. Paulous and F. Clavel, J. Gen. Virol. 77 (1996) 419. [58] D.D. Ho, T. Toyoshima, H. Mo, D.J. Kempf, D. Norbeck, C.M. Chen, N.E. Wideburg, S.K. Burt, J.W. Erickson and M.K. Singh, J. Virol. 68 (1994) 2016. [59] A.H. Kaplan, S.F. Michael, R.S. Wehbie, M.F. Knigge, D.A. Paul, L. Everitt, D.J. Kempf, D.W. Norbeck, J.W. Erickson and R. Swanstrom, Proc. Natl. Acad. Sci. USA 91 (1994) 5597. [60] Y. Lin, X. Lin, L. Hong, S. Foundling, R.L. Heinrikson, S. Thaisrivongs, W. Leelamanit, D. Raterman, M. Shah, B.M. Dunn and J. Tang, Biochemistry, 34 (1995) 1143.
332
[61] Z. Chen, Y. Li, E. Chen, D.L. Hall, P.L. Darke, C. Culberson, J.A. Sharer and L.C. Kuo, J. Biol. Chem., 269 (1994) 26344.
[62]
J.H. Condra, W.A. Schleif, O.M. Blahy, L.J. Gabryelski, D.J. Graham, J.C. Quintero, A. Rhodes, H.L. Robbins, E. Roth, M. Shivaprakash, D. Titus, T. Yang, H. Teppler, K.E. Squires, P.J. Deutsch and E.A. Emini, Mature, 374 (1995) 569. D.J. Kempf, H.L. Sham, K.C. Marsh, C.A. Flentge, D. Betebenner, B.E. Green, E. McDonald, S. Vasavanonda, A. Saldivar, N.E. Wideburg, W.M. Kati, L. Ruiz, C. Zhao, L. Fino, J. Patterson, A. Molla, J.J. Plattner and D.W. Norbeck, J. Med. Chem. 41(1998) 602.
[64] M. Markowitz, H. Mo, D.J. Kempf, D.W. Norbeck, T.N. Bhat, J.W. Erickson and D.D. Ho, J. Virol., 69 (1995) 701. [65] N.A. Roberts, J.A. Martin, D. Kinchington, A.V. Broadhurst, J.C. Craig, I.B. Duncan, S.A. Galpin, B.K. Handa, J. Kay, A. Krohn, R.W. Almbert, J.H. Merett, J.S. Mills, K.E.B. Parkes, S. Redshaw, A.J. Ritchie, D.L. Taylor, G.I. Thomas and P.J. Machin, Science, 248 (1990) 358. [66] H. Jacobsen, K. Yasargil, D.L. Winslow, J.C. Craig, A. Krohn, I.B. Duncan and J. Mous, Virology, 206 (1995) 527. [67] B. Maschera, G. Darby, G. Palu, L.L. Wright, M. Tisdale, R. Myers, E.D. Blair and E. Furfine, J. Biol. Chem. 271 (1996) 33231. [68] D. Xie, S. Gulnik, E. Gustchina, B. Yu, W. Shao, W. Qoronfleh, A. Nathan and J.W. Erickson, Protein Sci. 8 (1999) 1702. [69] D.B. Olsen, M.W. Stahlhut, C.A. Rutkowski, H.B. Schock, A.L. van Olden and L. Kuo, J. Biol. Chem. 274 (1999) 23699. [70] P.A. Ala, E.E. Huston, R.M. Klabe, D.D. McCabe, J.L. Duke, C.J. Rizzo, B.D. Korant, R.J. DeLoskey, P.Y.S. Lam, C.N. Hodge and C-H. Chang, Biochemistry, 36 (1997) 1573. [71] S.W. Kaldor, V.J. Kalish, J.F. Davies II, B.V. Shetty, J.E. Fritz, K. Appelt, J.A. Burgess, K.M. Campanale, N.Y. Chirgadze, D.K. Clawson,
333
B.A. Dressman, S.D. Hatch, D.A. Khalil, M.B. Kosa, P.P. Lubbehusen, M.A. Muesing, A.K. Patick, S.H. Reich, K.S. Su and J.H. Tatlock, J. Med. Chem. 40 (!997) 3979. [72] R.M. Klabe, L.T. Bacheler, P.J. Ala, S. Erickson-Viitanen and J. Meek, Biochemistry, 37 (1998) 8735. /
[73] P.K. Jadhav, P.J. Ala, F.J. Woerner, C.H. Chang, S.S. Garber, E.D. Anton and L.T. Bacheler, J. Med. Chem. 40 (1997) 181. [74] P.J. Ala, E.E. Huston, R.M. Klabe, P.K. Jadhav, P.Y. Lain and C.H. Chang, Biochemistry, 37 (1998) 15042. [75] L. Tong, S. Pav, C. Pargellis, F. Do, D. Lamarre and P.C. Anderson, Proc. Natl. Acad. Sci. USA 90 (1993) 8387. [76] J.P. Priestle, A. Fassler, J. Rosel, M. Tintelnot-Blomley, P. Strop M.G. Grutter, Structure 3(1995) 381. [77] S.S. Abdel-Meguid, B.W. Metcalf, T.J. Carr, P. Demarsh, R.L. Des Jarlais, S. Fisher, D.W. Green, L. Ivanoff, D.M. Lambert, K.H.M. Murphy, S.R. Petteway, Jr., W.J. Pitts, T.A. Tomaszek, Jr., E. Winborne, B. Zhao, G.B. Dreyer and T.D. Meek, Biochemistry, 33 (1994) 11671. [78] S.S. Hoog, E.M. Towler, B. Zhao, M.L. Doyle, C. Debouck and S.S. Abdel-Meguid, Biochemistry, 35 (1996) 10279. [79] V.V. Sardana, A.J. Schlabach, P. Graham, B.L. Bush, J.H. Condra, J.C. Culberson, L. Gotlib, D.J. Graham, N.E. Kohl, R.L. LaFemina, C.L. Schneider, B.S. Wolnaski, J.A. Wolfgang and E.A. Emini, Biochemistry, 33 (1994) 2004. [80] E.M. Towler, S.K. Thompson, T. Tomaszek and C. Debouck, Biochemistry, 36 (1997) 5128. [81] M.A. Swairjo, E.M. Towler, C. Debouck and S.S. Abdel-Meguid, Biochemistry, 37 (1998) 10928. [82] B.K. Shoichet and I.D. Kuntz, J. Mol. Biol., 221 (1991) 327. [83] Walls, P.H. and M.J.E. Sternberg, J. Mol. Biol., 228 (1992) 277.
334
[84] I.A. Vakser and C. Aflalo, Proteins: Struct. Funct. Genet., 20 (1994) 320. [85] R.M. Jackson and M.J.E. Sternberg, J. Mol. Biol., 250 (1995) 258. [86] D. Fisher, S.L. Lin, H.J. Wolfson and R. Nussinov, J. Mol. Biol., 248
(1995) 459. [87] R. Norel, S.L. Lin, H.J. Wolfson and R. Nussinov, J. Mol. Biol., 252
(1995) 263. [88] H.A. Gabb, R.M. Jackson and M.J.E. Sternberg, J. Mol. Biol., 272
(1997) 106. [89] R.M. Jackson, H.A. Gabb and M.J.E. Sternberg, J. Mol. Biol., 276
(1998) 265. [90] A.R. Friedman, V.A. Roberts and J.A. Thiner, Proteins: Struct. Funct. Genet., 20 (1994) 15. [91] D.K. Gehlhaar, G.M. Verkhivker, P.A. Rejto, C.J. Sherman, D.B. Fogel, L.J. Fogel and S.T. Freer, Chem. Biol., 2 (1995) 317. [92] G.M. Verkhivker, P.A. Rejto, D.K. Gehlhaar and S.T. Freer, Proteins: Struct. Funct. Genet., 25 (1996) 342. [93] W. Welch, J. Ruppert and A.N. Jain, Chem. Biol., 3 (1996) 449. [94] M. Rarey, B. Kramer, T. Lengauer and G. Klebe, J. Mol. Biol., 261
(1996) 470. [95] M. Rarey, B. Kramer and T. Lengauer, J. Comput.- Aided Mol. Des.,
11 (1997)369. [96] A. Caflish, P. Niederer and M. Anliker, Proteins: Struct. Funct. Genet.,
13 (1992) 223. [97] T.N. Hart and R.J. Read, Proteins: Struct. Funct. Genet., 13 (1992) 206. [98] J. Apostolakis, A. Pluckthun and A. Caflish, J. Comput. Chem., 19 (1998) 21.
335
[99] A. Di Nola, D. Roccatano and H.J.C. Berendsen, Proteins: Struct. Funct. Genet., 19 (1994) 174. [100] Z.R. Wasserman and C.N. Hodge, Proteins: Struct. Funct. Genet., 24
227. [101] K.P. Clark and Ajay, J. Comput. Chem., 16 (1995)1210. [102] C.M. Oshiro, I.D. Kuntz and J.S. Dixon, J. Comput.- Aided. Mol. Des., 9 (1995) 113. [103] G. Jones, P. Willett, R.C. Glen, A.R. Leach and R. Taylor, J. Mol. Biol., 267 (1997) 727. [104] C.A. Baxter, C.W. Murray, D.E. Clarak, D.R. Westhead and M.D. Eldridge, Proteins: Struct. Funct. Genet., 33 (1998) 367. [105] A.R. Leach, J. Mol. Biol., 235 (1994) 345. [106] J. Desmet, I.A. Wilson, M. Joniau, M. De Mayer and I. Lasters, Faseb
J.,
(1997)164.
[107] L. Schaffer and G.M. Verkhivker, Proteins: Struct. Funct. Genet., 33 (1998) 295. [108] M. Totrov and R. Abagyan, Nature Struct.Biol., 1 (1994) 259. [109] D.R. Westhead, D.E. Clark and C.W. Murray, J. Comput.-Aided Mol. Des., 11 (1997) 209. [110] P.A. Rejto, G.M. Verkhivker, D.K. Gehlhaar and S.T. Freer, In: W. van Cunsteren, P. Weiner and A.J. Wilkinson (Eds.), Computational simulation of biomolecular systems, ESCOM, Leiden (1997) 451. [111] N. Shah, P.A. Rejto and C.M. Verkhivker, Proteins: Struct. Funct. Genet., 28 (1997) 421. [112] P.E. Leopold, M. Montal and J.N. Onuchic, Proc. Natl. Acad. Sci. USA., 89 (1992) 8721. [113] J.N. Onuchic, P.G. Wolynes, Z. Luthey-Schulten and N.D. Socci, Proc. Natl. Acad. Sci. USA., 92 (1995) 3626.
336
[114] C. Zhang, J. Chen and C. DeLisi, Proteins: Struct. Funct. Genet., 34 (1999) 255. [115] M. Totrov and R. Abagyan, Proteins: Struct. Funct. Genet., Suppl. 1 (1997) 215. [116] B. Sandak, H.J. Wolfson and R. Nussinov, Proteins: Struct. Funct. Genet., 32 (1998) 159. [117] J.S. Dixon, Proteins: Struct. Funct. Genet., Suppl. 1 (1997) 198. [118] D. Bouzida, P.A. Rejto and G.M. Verkhivker, Int. J. Quantum Chem., 73 (1999) 113. [119] D. Bouzida, S. Arthurs, A.B. Colson, S.T. Freer, D.K. Gehlhaar, V. Larson, B.A. Luty, P.A. Rejto, P.W. Rose and G.M. Verkhivker, In R.B. Altman, A.K. Dunker, L. Hunter, T. Klein and K. Lauderdale (Eds.), Pacific Symposium on Biocomputing-99, Word Scientific, Singapore, (1999) 426. [120] C.M. Oshiro and I.D. Kuntz, Proteins: Struct. Funct. and Genet. 30 (1998) 321. [121] T.J.A. Ewing and I.D. Kuntz, J. Comput. Chem., 18 (1997) 1175. [122] Y. Sun, T.J.A. Ewing, A.G. Skillman and I.D. Kuntz, J. Comput.Aided Mol. Des., 12 (1998) 597-604. [123] D.M. Lorber and B.K. Shoichet, Protein Sci., 7 (1998) 938. [124] R.M. Knegtel, I.D. Kuntz and C.M. Oshiro, J. Mol. Biol., 266 (1997) 424. [125] D. Bouzida, P.A. Rejto, S. Arthurs, A.B. Colson, S.T. Freer, D.K. Gehlhaar, V. Larson, B.A. Luty, P.W. Rose and G.M. Verkhivker, Int. J. Quantum Chem., 72 (1999) 73. [126] D. Shortle, K.T. Simons and D. Baker, Proc. Natl. Acad. Sci. USA., 95 (199s) 1115s.
337
[127] G.M. Verkhivker, P.A. Rejto, D. Bouzida, S. Arthurs, A.B. Colson, S.T. Freer, D.K. Gehlhaar, V. Larson, B.A. Luty, T. Marrone and P.W. Rose, J. Mol. Recognit., 12 (1999) 371. [128] P.A. Rejto, D. Bouzida and G.M. Verkhivker, Theor. Chem. Acct., I01 (1999) 138. [129] G.M. Verkhivker, P.A. Rejto, D. Bouzida, S. Arthurs, A.B. Colson, S.T. Freer, D.K. Gehlhaar, V. Larson, B.A. Luty, T. Marrone and P.W. Rose, J. Comput.- Aided Mol. Des., in press. [130] G.M. Verkhivker, P.A. Rejto, D. Bouzida, S. Arthurs, A.B. Colson, S.T. Freer, D.K. Gehlhaar, V. Larson, B.A. Luty, T. Marrone and P.W. Rose, In A.K. Ghose and V.N. Viswanadhan (Eds.), Combinatorial library design and evaluation : principles, software tools and application8 for drug discovery. Marcel Dekker, New York (2000), in press. [131] S.J. Weiner, P.A. Kollman, D.A. Case, U.C. Singh, C. Chio, G. Alagona, S. Profeta and P. Weiner, J. Am. Chem. Soc., 106 (1984) 765. [132] W.L. Jorgensen and J. Tirado-Rives, J. Am. Chem. Soc., 110 (1988) 1657. [133] S.L. Mayo, B.D. Olafson and W.A. Goddard III, J. Phys. Chem., 94 (1990) 8897. [134] T.C. Beutler, A.E. Mark, R.C. van Schaik, P.R. Gerber and W. van Gunsteren, Chem. Phys. Lett., 222 (1994) 529. [135] E. Marinari and G. Parisi, Europhys. Left., 19 (1992) 451. [136] K. Hukushima and K. Nemoto, J. Phys. Soc., (Jap.)65 (1996) 1604. [137] U.H.E. Hansmann and Y. Okamoto, Phys. Rev. E, 54 (1996) 5863. [138] U.H.E. Hansmann and Y. Okamoto, Phys. Rev. E, 56 (1997) 2228. [139] U.H.E. Hansmann and Y. Okamoto, J. Comput. Chem., 18 (1997) 920. [140] U.H.E. Hansmann, Chem. Phys. Lett., 281 (1997) 140.
338
[141] P.F.W. Stouten, C. Frhmmel, H. Nakamura and C. Sander, Mol. Simul.,
( 993)97. [142] Z. Liu and S.J. Berne, J. Chem. Phys. 99 (1993) 6071. [143] B. Widor, J. Phys. Chem. 97 (1993) 1069. [144] C. Jarque and B. Tidor, J. Phys. Chem. B 101 (1997) 9362. [145] X. Kong and C.L. Brooks III, J. Chem. Phys. 105 (1996) 2414. [146] Z. Guo, C.L. Brooks III and X. Kong, J. Phys. Chem. B 102 (1998) 2032. [147] Z. Guo and C.L. Brooks III, J. Amer. Chem. Soc. 120 (1998) 1920. [148] D. Bouzida, S. Kumar and R.H. Swendsen, Phys. Rev. A, 45 (1992) 8894. [149] A.M. Ferrenberg and R.H. Swendsen, Phys. Rev. Lett., 61 (1988) 2635. [150] A.M. Ferrenberg and R.H. Swendsen, Phys. Rev. Lett., 63 (1989) 1195. [151] E.M. Boczko and C.L. Brooks III, J. Phys. Chem., 97 (1993) 4509. [152] S. Kumar, D. Bouzida, R.H. Swendsen, P.A. Kollman and J.M. Rosenberg, J. Comp. Chem., 13 (1992) 1011. [153] S. Kumar, J.M. Rosenberg, D. Bouzida, R.H. Swendsen and P.A. Kollman, J. Comp. Chem., 16 (1995) 1339. [154] N.D. Socci and J.N. Onuchic, J. Chem. Phys., 103 (1995) 4732. [155] A. Sali, E.I. Shakhnovich and M. Karplus, Nature, 369 (1994) 248. [156] L. Mirny, V.I. Abkevich and E.I. Shakhnovich, Fold. Des., 1 (1996) 103. [157] S. Kumar, P.W. Paybe and M. Vasquez, J. Comp. Chem., 17 (1996) 1269. [158] B. Roux, Comp. Phys. Comm., 91 (1995) 275.
339
[159] N. Thanki, J.K. Rao, S.I. Foundling, W.J. Howe, J.B. Moon, J.O. Hui, A.G. Tomasselli, R.L. Heinrikson, S. Thaisrivongs and A. Wlodawer, Protein Sci., 1 (!992) 1061. [160] A.M. Mulichak, J.O. Hui, A.G. Tomasselli, R.L. Heinrikson, K.A. Curry, C.S. Tomich, S. Thaisrivongs, T.K. Sawyer and K.D. Watenpaugh, J. Biol. Chem. 268 (1993) 13103. [161] K.H.M. Murphy, E. Winborne, M.D. Minnich, J.S. Culp and C. Debouck, J. Biol. Chem., 267 (1992) 22770. [162] B. Zhao, E. Winborne, M.D. Minnich, J.S. Culp, C. Debouck and S.S. Abdel-Meguid, Biochemistry, 32 (1993) 13054. [163] E.T. Baldwin, T.N. Bhat, B. Liu, N. Pattabiraman and J.W. Erickson, Nature Struct. Biol., 2 (1995) 244. [164] J.W. Erickson, Nature Struct. Biol., 2 (1995) 523. [165] L. Hong, A. Treharne, J.A. Hartsuck, S. Foundling and J. Tang, Biochemistry, 35 (1996) 10627. [166] C.D. Rosin, R.K. Belew, G.M. Morris, A.J. Olson and D.S. Goodsell, Proc. Natl. Acad. Sci. USA 96 (1999) 1369. [167] C.D. Rosin, R.K. Belew, W.L. Walker, G.M. Morris, A.J. Olson and D.S. Goodsell, J. Mol. Biol. 287 (1999) 77. [168] G.M. Verkhivker, K. Appelt, S.T. Freer and J.E. Villafranca, Prot. Eng., 8 (1995) 677-691. [169] A. Wallqvist, R.L. Jernigan and D.G. Covell, Prot. Sci. 4 (1995) 1881. [170] J.R. Collins, S.K. Butt and J.W. Erickson, Nature Struct. Biol., 2(1995) 334. [171] I.T. Weber and R.W. Harrison, Prot. Eng., 12 (1999) 469. [172] G.J. Tawa, I.A. Topol, S.K. Burt and J.W. Erickson, J. Amer. Chem. Soc. 120 (1998) 8856.
340
[t73] S.W. Rick, I.A. Topol, J.W. Erickson and S.K. Burr, Protein Sci. 7(1998) 1750. [[74] W.C. Still, A. Tempczyk, R.C. Hawley and T. Hendrickson, J. Am. Chem. Soc. 112 (1990) 6127. [175] H. A. Carlson and J. A. McCammon, Mol. Pharmacol. 57 (2000) 213.
L.A. Eriksson (Editor)
TheoreticalBiochemistry- Processes and Properties of Biological Systems
341
Theoretical and ComputationalChemistry, Vol. 9 9 2001 Elsevier Science B.V. All rights reserved
Chapter 9
Modelling G-protein coupled receptors Christopher Higgs and Christopher A. Reynolds* Department of Biological Sciences, Central Campus, University of Essex, Wivenhoe Park Colchester, Essex, C04 3SQ, Email."
[email protected], URL: http.//www, essex, ac. uk/bcs/staff/reync/
Introduction G-protein coupled receptors (GPCRs) are a superfamily of integral membraae proteins which act as mediators that transmit an extracellular signal into the cell a:ad invoke a cellular response 1. The diverse nature of this super-family allows these receptors to mediate a large number of different signals and to invoke a large numl:er of physiological functions such as sight, smell, and food intake. GPCRs are also involved in many disorders such as anxiety, epilepsy, stroke, Alzheimer's a:ad Parkinson's disease 2. With the large number of diseases associated with GP(',R malfunction, they currently make up approximately 50% of targets for all prescribed drugs 3. GPCRs mediate their cellular response through the binding and subsequent activation of a class of GTPases termed G-proteins. The activation of the receptor is initiated through the binding of a specific agonist, such as acetylcholine to the muscarinic receptor on the cell surface, resulting in a conformational change witkin the receptor. This conformational change results in the signal being passed into the intracellular domain by the formation of a high-affinity agonist-receptor-G protein complex. This interaction between the activated receptor and the heterotrimeric ~3protein results in an increase in the rate of dissociation for GDP (which in turn is replaced by GTP) and a decrease in the affinity of ligand binding to the receptor. The G-protein then dissociates from the receptor, which is accompanied by dissociation of (x-GTP from the [37 dimer. The free o~ subunit can then couple with a large number of effector proteins, such as adenylyl cyclase, phospholipase A2 and C, resulting in the activation or inhibition of cAMP, IP3 and diacylglycerol production2. The 137 subuJlit has also been shown to act on K § channels, adenylyl cyclase and [3-ARK4. In additic,n, the 3' subunit has recently been shown to act independently. The major structural characteristics of the GPCR family are the seven hydrophobic domains, 20-30 residues in length, which span the membrane as antiparallel a helices. The helices are interconnected by three intracellular (ill, il2 ii3) and three extracellular loops (eli, el2, el3), forming a seven-helix bundle in a similar but not identical fashion to bacteriorhodopsin (bR) 5. However, despite the common structural motif, bR has no sequence similarity to other GPCR sequences and does rot bind and activate a G-protein. With the lack of structural data, modelling GPCRs has become an important tool in understanding GPCR structure and function and in aiding rational drug design. A number of models have been proposed based on several different methods and structural templates. Here we review some of the approaches to predicting the structure of these receptors, predicting ligand-binding domains, modelling receptor activation and finally modelling the receptor-G-protein interaction. It is important to stress that theoretical studies on GPCRs are very different to those on many other
342 systems in that incorporation of experimental data, including protein sequence data, plays a major role.
Receptor Structure and Modelling Transmembrane domains
Currently, there are a great number of globular proteins for which the 3dimensional structure has been solved to atomic resolution. However, there are very few structures of membrane proteins solved to this resolution. It is difficult to use multidimensional NMR on membrane proteins as they tend to be quite large, they can form multimeric complexes and it is difficult to produce samples containing biological membranes. As a result, most structural data is obtained using X-ray crystallography and electron crystallography. However, there are a number of technical problems to overcome when using these methods. These include problems with overexpression, purification and the concentration of the protein 6 and in the preparation of the 3D crystals 7. Therefore, given the lack of any 3-dimensional structure, molecular modelling techniques have become a valuable tool in exploring the structure of these receptors. One of the successes in solving a membrane bound protein was achieved by Henderson et al 8, in which they used electron cryo-microscopy to obtain a 3dimensional density map of bacteriorhodopsin at a resolution of 3.5 A parallel to the membrane (but lower resolution perpendicular to the membrane) as shown in Figure 1. The solution of the bacteriorhodopsin structure in 1990, coupled with an increasing amount of mutagenesis data resulted in a number of GPCR models being published using homology modelling 9-~5. This technique takes the sequence of the protein with the unknown structure and aligns it with another sequence that is believed to share a common 3-dimensional fold and for which a 3D structure is known. Therefore, the accuracy of the final model is dependent on the accuracy of the sequence alignment. However, there is no significant sequence similarity between bR and the GPCRs. As a result, homology modelling between these two families is likely to include significant error. Table 1 shows various sequence alignments between bR and the ]32-adrenergic receptor. It is clear that a number of different possible sequence alignments have been used, highlighting the problems associated with aligning these two families.
Figure 1. The electron diffraction structure of bacteriorhodopsin (pdb code: 1BRD) at 3.5 [] resolution as reported by Henderson et al 8. The structure is viewed from the extracellular side with helix 1 shown at the top. The ligand, retinal, is located between helices 3, 5, and 6 forming a Schiff base with Lys 216 on helix 7. With some doubt over the validity of bacteriorhodopsin as a suitable template for GPCRs, a number of models have been constructed without the use of a template. Thus, Dahl et al ~6 constructed a model of the dopamine D2 receptor. From the available sequences, hydropathy plots confirmed the presence of seven hydrophobic
343 domains. The a-helices were said to be 27 residues in length. This is consistent with the crystal structure of rhodopseudomonas viridis, which contains 11 membrane spanning or-helices 17. The most polar surfaces of each helix were orientated to face inward to form a central core. The structure underwent energy minimisation and molecular dynamics. The final structure was a circular arrangement of helices with a large central cavity. Maloney-Huss et a118 produced a model in a similar fashion. However, the final model was derived after a number of different helix packing orientations were considered. Both of these models, when viewed from the extracellular domain, were arranged in a clockwise manner. This differs from bacteriorhodopsin, which is arranged in an anti-clockwise manner. Zhang et a119 proposed an alternative arrangement of the helices for a serotonin model. Again the model was constructed without the use of a template, however, unlike the bacteriorhodopsin structure and those structures suggested by Dah116 and MaloneyHuss 18, this model suggested a mismatch of helices, having neither clockwise nor anti-clockwise arrangement. A schematic diagram of the arrangement of helices in these models is shown in figure 2.
(~)
(B)
(c)
Figure 2. A schematic diagram of the arrangement of helices in various models as viewed from the extracellular domain. (A) a circular arrangement with its large central cavity as proposed by Maloney-Huss 18, (B) a mismatch of helices as proposed and Zhang 19. (C) The counter clockwise arrangement seen in bR and adapted in most current models. Several receptor models have been generated using automated methods. Herzyk and Hubbard 2~ used a rule-based technique to generate seven helix bundles, incorporating a large amount of experimental and theoretical data. The method was validated by initially generating a model of bacteriorhodopsin, which had a root mean square (rms) deviation of 1.9 A from the crystal structure. Peitsch e t a [ 21 developed the automated protein modelling server, Swiss-Model 22. Models are constructed in a two-stage process. In the first stage, the seven transmembrane helices are represented as being idealised and rigid. Structural restraints derived from theoretical and experimental data are used to fit the helices together. A penalty function is used to measure any violations to the structural restraints. This penalty function is then globally optimised using a Monte-Carlo Simulated Annealing procedure to generate an optimal model. In the second stage, the optimal model is converted into a full atom model by the ProMod package. Prusis et a123 generated a model structure of human melanocortin 1 in a similar fashion. Again a penalty function is used to monitor any violations of structural restraints when fitting the idealised rigid helices together. The receptor template generated is then refined using a Monte Carlo simulated annealing procedure in order to convert the template into a full atom model and to optimise this structure.
344 Table 1. Sequence alignment between the transmembrane helices of bR (Helix 1 Helix 7) and those reported in various GPCR models based on the bR template 9-14. The 132-AR sequence is recorded here for reference even when the authors have modelled other receptors. In each helix, residues in bold are class A sequence motifs or residues involved in ligand binding. A shift if 3 or 4 residues corresponds to a difference of 1 turn and a vertical displacement of about 5 A. Helix 1 Nordvall 9
I W L A
L G T
A
GMG
P
E W
I
L
L A
I
V
FGNV
Cronet 10
VWV
V GMG
I VM
S
L
I
V
LA
Yamoto I 1
WV
I
VMS
L
I
V
L A
I V F GNV
L V
I
T
I
VM
V L A
I V F GNV
L V
I
T
S
V GMG
Trumpp.Kallmeyer z2 Livingstone 13
VM
V GMG W V V G M G
Oliveira 14
I
I
V M S
W V V G M G
L
V
S
L
I
I
V
L A
L
I
I
VM
S
T
L V
Helix 2
D A
K K
F
Y A
I
Nordvall
F
I
T
S
L
A C
A D L VMG
Cronet
L
A C A
Yamoto
F
I
T
LGT
I
I
I
T A V
AFTMY
I
L
I
L V
I A
I
S ML
T
L I
L
D
L
L
A C A D L VMG
I
L
T
S
L A C A D L V M G L A V V P F G A A H
I
L
T T
S S
L A C L A C
A D L VMG L A VVP A D L V M G L A VV P
F GAA H F G A A H
I I
LM L
Helix 3
YWA
R YA
T P L
L L
L D
Nordvall
CE
V TA
S
E T L C V
Cronet
Y
F F
I I
FWCE
Yamoto
N
Trumpp-Kallmeyer Livingstone Oliveira
I F WC N
W C
E
CE F W T S T
I
I
I
L M V W
Cronet
T
K N
Yamoto
I
I
I I K A
K A
I
R V
I
D VLC
S
I
G
I
LMVW
S G L T
L T
S
F
Cronet
N
Q A Y A
I
A
S
S
I
V
Yamoto
N
Q A
I A
S
S
I
V S
A
I
I
S
F YV
A
I AS Y A I
S
S
S I ASS
Helix 6
S
K T
L G
I
IMGT
I
F
KV
V
V S I
Nordvail
FYV
P
F G
T
FYV
P
L V
I MV
F YV
P
L V
I M V
LV
I MV
F V Y
F T L C W L P F F P
F
F
I VN
I
MG
T
F
T
L CWL
K T
L G
I
I MG
T
F
T
L CWLP
F F
S
I
T
L G
I
I MG
T
F
T
L CWLP
T K T
L G L G
I I
I MG IMGT
T
VH
V
Q V
VN
I
V
VN V N
I VHV IV
F F
I I I
I
E
T
L
L
L DV
S AK
I
R
K
E
V Y
I
L
L N W
I
Cronet
E
V Y
I
L
L N W
I
V GFG
L
L
G Y V N S G
F N P
L
I
G Y V N
S G
I
Y C R
S
F
N P L
L
I
V
I
F T L CWL P F F F T L C W L P F F
L
I I
VN
N
I
V N
I
Helix 7
Yamoto
S
F
L
L R N V T V V L W S A Y P V V W L
F MV
IAVD
F Y V P L V I MV F V Y S R VS F Y V P L V I M V F V Y
Nordvall
Trumpp-Kallmeyer Livingstone Oliveira
F
I M V F V Y
VS
Trumpp-Kailmeyer
S
L
I
L
I QMHWY
L YV
S
K A
F LP
PLV
T A A M L Y
Livingstone Oiiveira
G L T
I
S
I
I QMHWY
I VS
IQ
S
L G
F L P
I QMHW S F L P
I
T
V TA S I E T L C V I E T L C V I AVDR
I QM
A
Yamoto
T
F L P GLT
I
Cronet
L
E
S S
A
T
T
I
P
Y A
A Y A
E
F L
ww
Q
I S
VTA
L T I V
Helix 5
Livingstone Oliveira
S
L C V T A
S
Nordvali
Trumpp-Kallmeyer
I
A V D
L M V W I V S G R V I I LMVW
Y A
L
I
S
V
G
DV
L
I M I G T G L V G A L
V S V S
L A
L T
I
I
I
I E T L C V
S
I
LMKMW
F GAAH
F W T
L A L V G A D G
LMVW
MVW R V
FWT
FWTS I D V L C I D V L C V T A S
G
A
F T
D V L C
D V L C V T A S E
Helix 4
Livingstone Oliveira
I
F W C E
Nordvall
Trumpp-Kallmeyer
DWL
FWTS
FWTS
L AVVP
I
A
T A
F GAAH
GAAH
I
L V
S
Livingstone Oiiveira
F
I
V F GN
I V F G N V
L A VVP
L A V V P
L V KGM
L V
V F G N V L V
V L A
P A
L Y F
T
Trumpp-Kallmeyer
VMG
LMG
R
I
R K
E
V Y
I
L
LNW
I
G Y V N S G
F N P
L
I
L
I
R
K
E
V Y
I
L
LNW
I
G Y V N S G
F NP
L
I
L
I
R K R K
E E
V Y V Y
I I
L L N W I L L NW
I
G Y V N S G F N P L G Y V N S G F N P
L
I I
YG
345 The initial structural data of a true GPCR came from the electron cryomicroscopy studies of bovine rhodopsin, published by Schertler et a124 in 1993. The projection map at a resolution of 9 A, shown in figure 3, revealed a number of structural features, including an arc-shape of electron density plus four individual peaks. The individual peaks were interpreted as four individual helices, which are almost orientated perpendicular to the membrane. The arc of electron density was interpreted as the three remaining tilted helices. In a direct comparison with the projection density maps of bacteriorhodopsin at 9 A and 7 A resolutions, published by Henderson 25 and Unwin 26, respectively, it is possible to identify similarities between the two receptors, confirming the presence of 7 helices in rhodopsin. However, the projection structure of rhodopsin is more rounded and slightly wider than bacteriorhodopsin suggesting that the helices are tilted and orientated differently in rhodopsin. A second projection density map of bovine rhodopsin was published by Unger et a127 in 1995. This projection map revealed structural details just beyond a 10 A resolution within the plane of the membrane. Again four of the seven helices were identifiable, confirming the interpretation of the earlier projection map of Schertler 24. However, one of these four helices was identified as having a more substantial tilt than previously identified.
Figure 3. The projection map of bovine rhodopsin at 9 A resolution24. One unit cell contains four rhodopsin molecules. Two rhodopsin molecules have been circled. The density map shows four individual peaks, which were interpreted as four individual helices orientated almost perpendicular to the membrane. The arc-shape of density was assigned to the remaining three tilted helices. Reprinted by permission from Nature (1993) 362, 770-772. Copyright 1993 Macmillan Magazines Ltd. In an elegant study by Baldwin 28, the 204 class A GPCR sequences available in 1993 were aligned. A clear pattern developed for the seven hydrophobic domains, despite the percentage identity between some family members being as low as 20%. With a putative assignment of the transmembrane domain, it was found that each interhelical loop can be short in different families. As each GPCR family is expected to fold in the same manner, this suggests that each helix, when positioned in three dimensions, must be closest to its neighbours in the sequence. This evidence indicates that the arrangement of the helices proposed in models by Zhang et a119 is incorrect. However, the clockwise model continued to receive much support, primarily because of evidence for an interaction b e t w e e n Asp 224 and A s n 729 in the gonadotrophin and
346 serotonin receptors that could not be reproduced in a regular (x-helical model while simultaneously keeping A s n 719 and L e u 718 pointing into the binding site. A s n 719 w a s implicated in binding [3-antagonists, largely through gain of function mutation studies on related receptors where the corresponding residue is mutated to Asn; Leu 718 has been implicated in determining the [3-adrenergic subtype specificity of norepinephrine and epinephrine, largely through correlated mutation analysis 29 and theoretical calculations 3~ The clockwise model was ruled out by studies of Liu et a131 and Mizobe et a132 where mutations in chimeric receptors resulted in a gain of function, with the second study implicating Leu 718 in ligand binding. Further evidence for the counterclockwise arrangement of helices is provided by studies on an engineered zinc site in the NK- receptor by Elling et a133, where it is impossible to fulfil these constraints in a clockwise model. The numbering system used in this review for putative transmembrane residues is that of Oliveira et all4; other residues are numbered according to the position in their native sequence. The clockwise models, and the experiments supporting them, have played an important role in GPCR modelling in highlighting the need to consider moving away from an ideal a-helical structure for helix 7. Indeed, substituted cysteine accessibility studies of Javitch et a134 on helix 7 have suggested that it is not a regular helix, but the jury is still out as to the correct structure for helix 7, with debate surrounding the conformation at the (N/D)P of the NPXXY motif. We have suggested that a short 310helix is sufficient perturbation to satisfy all the experimental constraints 35 while Konicka et a136, using database searching, homology modelling and Monte Carlo simulations have suggested that helix 7 contains an Axx turn and a flexible hinge. The suggestion of a 3~0-helix is consistent with the electron microscopy studies, which provide no evidence for a kink in this region (The observed kink, predicted by Findlay in 1986 - see below - is near to the extracellular end of the helix). Spin labelling studies 37'38 and NMR studies of Yeagle et a139 provide evidence that helix 7 is extended beyond the membrane and these observations are probably more consistent with a short 310-helical section than with an Axx turn. It is unlikely that both suggestions are correct but both studies certainly highlight the need to consider the conformational properties of helix 7. Baldwin 28 identified the orientations of the seven hydrophobic domains on the basis of several criteria. Positions where residues are conserved, where variability is restricted, i.e. there are less than 9 different amino acids and/or there is a restriction on the range of side-chain sizes, and positions where polar residues can be accommodated are expected to face inward or towards another helix. On the basis of these results, helices 1, 4 and 5 were identified to be the most exposed to the lipid whereas helices 2, 6 and 7 are less exposed. Helix 3 was found to have very little exposure to the lipid. With this evidence, along with experimental data such as site directed mutagenesis data, a putative three-dimensional structure was proposed. The arrangements of these helices in this putative structure were found to be in good agreement with the projection map proposed by Schertler 24. A comparison of this model structure with the structure of bacteriorhodopsin showed a significant structural difference between these two structures. This 1993 study confirmed that bacteriorhodopsin is not a suitable template to use in homology modelling of GPCRs. The Unger/Baldwin model became the new template on which many new models were based 35'4~ However, Unger et al published a 3-dimensional electron density map of frog rhodopsin at 7.5 A resolution 44, shown in figure 4. This map provides a wealth of data to assist in the modelling of these receptors. Baldwin et a145 reanalysed the 493 class A GPCR sequences available in 1997 in the same manner as
347 described previously 28. With this increase in the amount of available sequence data, it was possible to give more detail with regards to the position and length of each helix. In the previous model, helices were said to be 26 residues in length. In the updated model, helices vary from 20 to 3 5 residues in length. A direct comparison between the current projection map and the projection map published in 1995 reveals a small but important difference. It was found that the angles between some neighbouring helices were underestimated in the previous projection map. A number of models have been published using this new template 36'4642. Figure 5 shows the backbone of bacteriorhodopsin and Baldwin's a-carbon template model of rhodopsin from 1997, viewed from the extracellular domain. From this it is clear that the relative positioning and tilt of the helices differ between these two structures. All the current receptor sequences for each class, family and subfamily are available from the G-protein coupled receptor database (GPCRDB) on the worldwide web ~3'~4. This database, developed and maintained by G. Vriend is an invaluable source of data for GPCR modellers.
l
9
": ":"
i
222.
:t.
.................................................................
_~
..............................................
I
............................................................................................
;2,/,. ..........
': ::S; "
::_:;,So"
46A
.
.
.
.
.
.
:I: .
.
~2A
.
:....:,.77:.,
"-zc-
$
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.. :.. 9
i:)
-
.
"
,
,
.....
%.7",
.... ,
,::;-
" .:
,,.::-,, ....
?. ;,
,
: "-..,y
-20A
'gi' ,i':~ "
';~~ "
. . . . . . . . . . . . . .
:-;,
l aa
.......
-:.,.
{ ...........................................
I
$
i i
"J~"
....
~
....
~1
Figure 4. S-dimensional electron density map of frog rhodopsin at 7.5 A resolution 45. The sections are viewed from the intracellular domain. The slide at 0 A is taken as the centre of the lipid bilayer. In this slide, the axis for helices 1 is marked on the left hand side; the axes for helices 2, 3, 4, 5, 6, and 7 respectively are marked clockwise from this point with the axis for helix 3 lying between those of helices 4 and 7. Following the slides round in a clockwise manner, each section steps 4 A towards the extracellular side. Reprinted by permission from J. Mol. Biol. (1997) 272, 144-164. Copyright 1997 Academic Press Ltd.
348 Table 2. Packing bacteriorho dop sin 55
angles
Helix pair 7-1 1-2 2-3 3-4 4-5 5-6 6-7 2-7 3-6
between
pairs
Rhodopsin (deg.) 22.9 26.5 15.0 -28.2 25.2 22.4 13.5 34.4 35.6
of
helices
for
rhodopsin
and
Bacteriorhodopsin (deg.) 8.2 27.0 5.6 -10.1 20.3 7.7 12.8 21.2 18.2
Figure 5. Backbone structures of bacteriorhodopsin (left) and Baldwin and Schertler's 1997 rhodopsin model (right) as viewed from the extracellular domain. Helix 1 lies to the bottom right in each case. In the rhodopsin model, helices 4, 6 and 7 are orientated perpendicular to the membrane. Only helix 7 in the bacteriorhodopsin structure is shown perpendicular to the membrane, as it is not possible to orientate helices 4, 6 and 7 perpendicular at the same time, confirming the different tilt of the helices in the two structures. The major structural difference between these two structures is the packing angle between neighbouring pairs of helices as shown in table 2. Helix pairs 2-3, 3-4, 5-6, 7-1, 2-7 and 3-6 in rhodopsin have significantly different angles compared to those in bacteriorhodopsin, which is a clear indication that bacteriorhodopsin is not a good template for GPCR models. The helix packing angle for neighbouring helices is generally around +23 ~ which is consistent with a 3-4 ridges-in-grooves packing class, as defined by Chothia et al s6. Here, a ridge is formed on the first helix between the C~ atoms at positions i and i+3. This ridge packs into a groove on the second helix formed by two i and i+4 ridges. An exception to this is helix pair 3-4, which has a packing angle o f - 2 8 ~ which is consistent with a packing class of 4-4. This differs slightly from a 3-4 packing class as the ridge formed on the first helix used residues at positions i and i+4. Also, the packing angle between helices 6 and 7 is +13.5. This may still be accommodated in a 3-4 packing class by a decrease in the number of
349 residues per turn. This analysis of helix packing classes was used as a quality check in a 132-adrenergic receptor model constructed by Gkoutos and co-workers 46. It is not a trivial matter to satisfy all of the packing constraints given in table 2 through welldefined ridges-in-grooves packing criteria. It was equally difficult to create a model that continued to satisfy these criteria during the course of a molecular dynamics simulation. These criteria therefore provide a very useful check on GPCR structure. The most recent projection map of bovine rhodopsin is to 5 A resolution 57. Two-dimensional crystals were obtained of bovine rhodopsin that showed p22121 symmetry. From this, electron diffraction patterns could be taken which gave an improved projection map, shown in figure 6.
Figure 6. Projection density map of bovine rhodopsin p22121 crystals 57 to 5 A. One unit cell contains four rhodopsin molecules. It is possible to identify 3 clear peaks of electron density, relating to the helices perpendicular to the membrane. An increase in the electron density towards helix 5 (labelled) is also seen. Reprinted by permission from J. Mol. Biol. (1998) 282, 991-1003. Copyright 1998 Academic Press Ltd.
There are several observable differences between the 1995 and 1997 projection structures. Firstly, the curve of electron density towards helix 5 is longer in the 1997 map, with a more defined peak of density in this region. Therefore, there is more density between helices 3 and 5 than predicted in previous projection maps. Secondly, the tilt (to the vertical) on helix 5 has been underestimated. Earlier projection maps 24'27 identified helix 5 as a clear peak, suggesting that helix 5 should be untilted. However, the 1997 projection map indicates an elongated peak for helix 5, which suggests that helix 5 has a well-defined tilt. Finally, electron density is now detected between helices 6 and 2 and between helices 6 and 7. The density between helices 6 and 7 may indicate that helix 7 is perturbed towards the extracellular side. This perturbation in helix 7 was predicted from modelling studies 5s as early as 1986. It is clear that there is no one correct receptor model. However, since the early modelling work based on bacteriorhodopsin, receptor models have become increasingly more accurate regarding their ability to explain experimental observations. Further improvements in the resolution of electron density maps will only add to the accuracy of these models, allowing them to become a predictive tool also.
350 Interhelical domain
With the advances in projection structures of rhodopsin, modelling the transmembrane helices has become increasingly more accurate. Little structural data currently exists for the extracellular domain 59'6~ however, Yeagle et al have predicted the structure of the interhelical domain of rhodopsin 396~-63. For the three cytoplasmic loops, peptides containing the 17, 26 and 22 amino acid residues of the first, second and third cytoplasmic loops of bovine rhodopsin were synthesised. NMR spectra of the solvated peptides were taken and the NOEs were used to generate a number of structures, with the best distance geometry structures undergoing energy refinement. The remaining structures were overlaid to give a representative structure 6~'62. The first cytoplasmic loop exhibited a well-ordered structure, forming a [3-turn with two hydrogen bonds across the turn. The second cytoplasmic loop also exhibited a 13-tum with one hydrogen bond across the turn. In addition, both the N- and C-terminal ends of the il2 peptide were helical. Although the ends were not a-helical, the structure is consistent with the unwinding of the helices as they pass into the cytoplasm. The third cytoplasmic loop exhibits a different structure. Although the N-terminal portion is poorly defined due to a lack of constraints and the C-terminal portion only has a small number of constraints, the middle portion is well defined and displays an a-helical structure. Therefore, Yeagle proposed a turn-helix-turn motif for this loop. The formation of a fourth cytoplasmic loop by palmitoylation of cysteine residues in the C-terminus has been shown to be crucial in G-protein coupling for some receptor subtypes 64. Yeagle et al constructed a polypeptide containing 43 amino acid residues of the carboxyl terminal domain of bovine rhodopsin and performed a similar study as previously described. The representative structure of the carboxyl terminal shows an organisation of the structure into three subdomains. The first subdomain, the N-terminal domain of this peptide, forms an a-helix containing 7 or 8 amino acids, which is believed to be a continuation of helix 7. The second subdomain contains the two cysteine residues, which are situated near the location of a putative lipid bilayer and therefore are accessible to palmitoylation. With the location of the cysteine residues near to the putative lipid bilayer, a loop is formed between the cysteine residues and helix 7. The third subdomain is partially composed of an antiparallel [3-sheet with the two strands connected by a [3-turn. This subdomain appeared to be the most exposed, which makes the phosphorylation sites readily available to rhodopsin kinase 39. The independent solution to these 4 domains has provided an insight to the structure of the cytoplasmic domain. Yeagle and co-workers solved the solution structure of these 4 peptides and showed that they form a complex when expressed together 63. Using the model proposed by Baldwin 2s, the cytoplasmic face was added. It was shown that the cytoplasmic face fitted correctly with this model and was also in good agreement with a large amount of experimental data. With this more accurate representation of the cytoplasmic face, it may now be possible to identify specific interactions between the receptor and its associated G-protein. However it must be noted that there is still some uncertainty in the structure due to the lack of NMR constraints. A number of theoretical studies have attempted to elucidate this interaction and this work is reviewed in a latter section.
351
Ligand binding Identification of the ligand binding domain in GPCRs is crucial to the study of GPCRs. However, with the large number of GPCR agonists and antagonists, identifying those residues involved in ligand binding is not trivial. Also, the position of the ligand binding domain is not conserved throughout the GPCR superfamily 2. The binding domain of the biogenic amine receptors has been shown to lie within the helix bundle, approximately 10-15 A below the membrane surface. However, GPCRs that bind peptides, such as the opiate receptors, bradykinin and neurokinin receptors, form a binding pocket between the N-terminus, the extracellular loops and a binding cavity within the helix bundle 6s. A number of GPCRs also have an unusually large Nterminal domain (300-400 residues) that can bind their ligand. These include the glycoprotein hormones (gonadotrophins, follicle-stimulating hormone and thyroidstimulating hormone) and some class 2 receptors e.g. glucagon and glucagon-like peptide 2. A fourth ligand-binding domain is exhibited by the thrombin receptor. Here thrombin cleaves the receptor's N-terminal resulting in a new N-terminus, which is believed to interact with the remaining receptor, resulting in activation 2. Figure 7 shows the positioning of some ligand binding sites in various GPCR subfamilies.
Figure 7. A schematic diagram of the proposed ligand binding domains, shown in light grey, in various GPCRs. The helices are shown as cylinders, the loops, N-termini and C-termini are shown as lines. (A) Biogenic amine receptors, (B) Peptide receptors, (C) Glycoprotein hormone receptors, (D) Thrombin receptor.
352 In modelling the ligand binding domain, it is not sufficient to identify whether a ligand binds in the transmembrane domain or to the N-terminal. Site directed mutagenesis data has provided the greatest source of information to molecular modellers regarding which residues play a role in ligand binding. Indeed, the amount of mutation data is continually growing and is widely available from GRAP and tinyGRAP 66'67 mutant databases on the World Wide Web (http://wwwgrap.fagmed.uit.no/GRAP/homepage.html). A number of 3-dimensional structures of various GPCRs containing their associated ligand, based on a number of different templates 8'28'44'45 have been constructed. Here we review a number of studies on ligand-receptor complexes. Amine receptors
The 132-adrenergic receptor, along with rhodopsin, is one of the most widely studied of all GPCRs. These receptors are involved in the cardiovascular system and bind ligands such as epinephrine and norepinephrine, the natural ligands. A number of [32-adrenergic receptors models have been built and their putative ligand binding domain identified 12' 18,35,40,68. In an early study by Maloney-Huss and co-workers 18, a clockwise receptor model based on the hamster sequence was built. The transmembrane domain was constructed by de novo modelling (without the use of a structural template) with the most polar surfaces of each helix being orientated to face inward forming a central core. It was noted that a number of residues implicated to interact with epinephrine, as identified by site-directed mutagenesis, formed a central core. Epinephrine was crudely docked into this binding pocket, with the protonated amine of the ligand forming an ionic interaction with Asp 322, which is fully conserved throughout the amine subfamily. It was also found that the two hydroxyl groups of the aromatic ring could form hydrogen bonds with two serines, Ser 513 and Ser 516, on helix 5. Again mutation data had implicated these two serines to be involved in ligand binding. Gouldson et a168 probed the binding domain of their [32-adrenergic model, based on Schertler's projection structure 24, using GRID 69. The hydrogen bond acceptor was clearly identified as Asp 322, while the hydrogen bond donor is one or more of three serines on helix 5, Ser 512, Ser 5~3 and Ser 516. A further study by Gouldson et a135 exploited the domain structure of GPCRs. GPCRs have been shown to be comprised of two independently folding domains, the N-terminus and helices 1-5 and helices 6-7 and the C-terminus 7~ Again GRID 71 was used to analyse the binding domain of an updated model. Two hydrophobic pockets were identified in the transmembrane part of the first domain, the first pocket being formed by Met 119 Va1123, Ile 126, Va1127, Met 227, Gly228 Val TM, Phe TM and Va1326 while the second pocket was formed by Va1323, Ile 416 and Phe 517. In previous studies 68, the docking of ligands proved difficult due to steric hindrance by Trp 618 and Phe 622. This was overcome by translating the second domain, helices 6 and 7,-5-7 A away in the membrane plane. The amino group of the ligands was then placed near Asp 322 while the hydrophobic portion was aligned into one of the hydrophobic pockets. The system underwent energy minimisation and molecular dynamics to reassociate the second domain, forming the receptor-ligand complex. This novel approach to ligand docking was found to be successful as the ligand was found to satisfy a number of key interactions suggested by experimental data. These include interactions between amino group of the ligand and Asp 322, the catechol hydroxyl groups and Ser 5~3 and Ser 516 and the aromatic ring and Trp 618 and Phe 622. Gouldson et a/29and Oliveira et a114 also used a the technique of correlated mutational analysis (CMA) to identify residues involved in
353 ligand binding. With high sequence homology between GPCR subfamilies, it is possible to identify those residues that are important in the structure and function of the protein. In both studies, many of those internal residues identified by CMA correlate with residues involved in ligand binding and subtype specificity. In fact, the mutation of these residues identified by CMA as potentially being involved in subtype specificity, was found to alter the ligand binding properties of those receptors. Despite some minor differences reported between different authors, the general consensus in the mode of binding of the amine receptor ligands to GPCRs is one of the successes of GPCR modelling. A comparative table showing the extent of the agreement for the 132-adrenergic receptor is given in reference 35 Other members of the amine receptor subfamily have been studied and their ligand docked using computational techniques. These include dopamine 12'13'16, serotonin 12'19'73 and muscarinic receptors 9'74'7S. In each study, the ligand was interactively docked using molecular graphics and some experimental data. These structures underwent energy minimisation and molecular dynamic simulations to obtained an energetically favourable position for their associated ligands. As with the [32-adrenergic receptors, Asp 322 is implicated in binding the amino group of the ligand and was used in each case as an anchor point. The dopamine receptors, like the [32adrenergic receptors, have the conserved serines on helix 5. Again, in each dopamine model, Ser 513 and Ser 516 were identified as ligand binding residues, in good agreement with experiment. In both the muscarinic and serotonin receptors, Ser 513 and Ser 516 are no longer conserved. In the muscarinic receptors, these residues are replaced by alanines. The natural ligand, acetylcholine, lacks the phenolic hydroxyl groups, therefore alternative contacts are made. These have been identified as Thr 512 and Asn 622 and are confirmed by experimental data and modelling studies. The serotonin receptors bind 5-HT, which contains one hydroxyl group. Modelling of this receptor revealed the possibility of Ser 512 forming a hydrogen bond with the hydroxyl group, again confirmed by experimental data 68.
Peptide receptors The peptide subfamily, although less well studied and modelled than the amine subfamily, is important in many functions including pain (opioid), uterine smooth muscle contraction (oxytocin) and food intake (neuropeptide Y). A number of molecular modelling studies have been carried out on a number of different members of this family. The cannabinoid receptor binds a number of ligands. They can be split into four groups: classical cannabinoids, fatty acid amides and esters, bicyclic or nonclassical cannabinoids and aminoalkylindoles. Mahmoudian constructed a model of the CB1 receptor based on the bR template15. The ligand, A9Tetrahydrocannabinol (A9-THC), a classical cannabinoid, was docked into the internal cavity using the AUTODOCK set of programs 76. The initial position of the ligand was placed into the central cavity, corresponding to the position of retinal in bR. Several hundred runs of space search were carried out to identify the best ligand-receptor binding position. The ligand was predicted to bind in a hydrophobic pocket consisting of Met 419, Trp 42~ Trp 618, Leu TM, Leu 622, and Ala 529. The phenolic hydroxy group of A9-THC formed a hydrogen bond with the carboxy group of Ala 324. Mahmoudian claimed that the position of the docked ligand was consistent with 3-dimensional QSAR studies 77, which showed that an area of steric repulsion behind the C-ring of A9-THC is responsible for a decreased activity, whereas the steric bulk of the C-3 side chain contributes to increased binding affinity.
354 Cannabinoid agonists exhibit little or no subtype specificity. However, WIN55212-2, a prototypic aminoalkylindole, has a higher affinity for CB2 than for CB1. Song et al have produced a study, both experimental and modelling, to identify residues responsible for this subtype specificity 49. Models were constructed based on the projection structure of frog rhodopsin 27 to 7.5 A. WIN55212-2 was initially docked in an s-trans conformation in the hydrophobic pocket (formed by helices 3, 4 and 5) using interactive molecular graphics. The receptor-ligand complex underwent energy minimisation and molecular dynamic simulations to obtain an energy minimum structure. Two possible binding domains were identified. The first was an intrahelix aromatic stack formed by Phe TM, Phe 235, Phe 238 and Phe 241. The second domain, formed by Phe 316, Trp 42~ Trp 515 and Phe 326, was deemed to be a more likely interaction region as there is an upper and lower side to the aromatic stack. It was found that by docking the ligand in this region, a continuous stack of aromatic residues over several turns of helices 3, 4 and 5 was formed, which is likely to be energetically favourable. The binding site was in good agreement with experimental data proposed by Shire et al TM. With the ligand bound in this region, the indole ring of the ligand was found to have a strong aromatic interaction with Phe 5~8, which is unique to CB2. The involvement of this residue was confirmed by site-directed mutagenesis. The mutation V518F in the CB1 receptor resulted in a 12- to 13-fold increase in the affinity for WIN55212-2, whereas the mutation F518V in the CB2 receptor resulted in a 14-fold decrease in affinity. In a second study by Tao et al 5~ CP-55,490 was docked into both CB1 and CB2 models into a hypothesised hydrophobic binding pocket around helices 6 and 7 in the same manner as described previously. It was predicted that the orientation of the ligand differs between receptor subtypes. In CB1, the major hydrogen bonding is between the phenolic hydroxyl group and Lys 3~8, the C1 hydroxyl group and Trp 5~5 and the C4 side chain hydroxyl with Asn 725. However in CB2, the major hydrogen bonding interactions can be found between the C4 side chain hydroxyl and Lys 318, the phenolic hydroxyl group and Asn 725 and the C1 hydroxyl group with both Ser 332 and Thr 336. The mutation K318A, in the CB~ receptor, resulted in a significant drop in the affinity for CP-55,940 as only the Asn 725 and hydrophobic interaction are retained. However, in the CB2 receptor, three hydrogen bonding sites are retained. Therefore only a slight decrease in the affinity, relative to the wild-type CB2 receptor, is observed. Reggio has produced an excellent review on modelling the cannabinoid receptors 79. Mouillac and co-workers have modelled the binding site of the vasopressin receptor 8~ The model was constructed based on the 9 A electron density map of bovine rhodopsin 24. Arginine vasopressin (AVP) was manually docked into the proposed ligand-binding site of the minimised structure based on several criteria. Firstly, the side chain of Arg 8 on AVP had to be near to the first cytoplasmic loop. Secondly, the hydrophobic part of AVP had to fit into the hydrophobic pocket while the polar part had to fit in the polar zone of the receptor. Thirdly, the ligand-receptor hydrogen bonding network had to be optimal. This docking procedure was repeated a number of times with different initial orientations of the ligand and side chains. However only one possible solution was found to satisfy all the criteria. The final receptor-ligand complex was then energy minimised. The ligand was found to bind in a hydrophobic pocket defined by helices 2 to 7 and was formed by residues Va1323, Ala 325, Met 326, Phe 517, Tyr 614, Trp 618, Phe 621, Phe 622, Ala 719 and Ala 722. However, the entrance to the transmembrane domain and the binding pocket was found to be hydrophilic in nature and comprised of Gln TM, Gln 235, Lys 319, Gln 322, Ser427, Gln 43~
355 Thr 51~ Thr 513, Cys 617, Gln 625 and Ser 723. The side chain of Arg 8 of the ligand was found to be close to ell, which is in agreement with photoaffinity labelling studies 81. This model was then used to identify residues involved in hormone binding. A number of these residues were then mutated to see if they are involved in ligand binding. Q231A, Q235A and Q322A were found to decrease the affinity for agonist while not affecting the affinity for antagonist. A second study by Phalipou and co-workers 82 identified the antagonistbinding domain. The model was built in the same fashion as described previously. The photoactivatable ligand, [125I][Lys(3N3Phpa)g]HO-LVA, was docked into the model by superimposing the backbone and side chain onto the corresponding features of the AVP. This new ligand-receptor complex was then minimised to remove any unfavourable interaction. Again, a number of ligand binding residues were identified. A number of these residues were mutated and it was found that several formed key interactions with the ligand. The interaction of the peptide and the tachykinin receptor, NK2, has been modelled by Saebo et a183. Five individual sub-domains relating to helix 1, helix 2, helix 3, ell and el2 of NKl were modelled. From experimental data, it is believed that substance P interacts with specific residues in these 5 sub-domains. Substance P was placed so that those proposed residues could interact. The DOCKING package within the InsightII suite of programs 84 was used to determine the intermolecular, van der Waals and electrostatic interactions. The non-bond energy was then used as a guide to determine the preferred orientation of the two molecules with respect to one another and the final structure was energy minimised. Two sets of calculations were carried out on this structure, one in a solvated medium and the other in vacuum. The interaction energy of substance P with the model was then calculated in each medium. The results predicted that a number of residues on helix 2 were crucial in receptor binding and activation, which was in good agreement with residues predicted by Huang et al 8S. The Bradykinin receptor-ligand complex has been modelled by Kyle et a165. The receptor was constructed using homology modelling and the bacteriorhodopsin structure was used as an initial template. From computational and NMR studies, it was found that the ligand adopts a C-terminal [~-turn when it is complexed with the receptor. As a result, harmonic constraints were used to define a [3-turn in a tetrapeptide probe. This probe was then translated about the model in a systematic manner, similar to GRID 69, but more complex. At each new position, the complex was energy minimised. The overall structure of the receptor was maintained by a weak harmonic constraint placed on the receptor backbone. The results of the probe calculations indicated that the C-terminal portion of the ligand could be accommodated in the central part of the receptor, near the extracellular domain. Using this information as a steering device, the entire bradykinin molecule was used as a probe in the same manner as previously described. Twenty four possible geometric orientations were sampled at 100 grid points identified during the initial tetrapeptide probing. The 2400 possible positions of bradykinin within the receptor were used as starting points for geometry optimisation and the calculation of interaction energies. Complexes that had an interaction energy of less than 150 kcal tool l were considered as the most likely candidates. Of the remaining 17 structures, one was chosen as it was in agreement with site directed mutagenesis data. The model was also validated based on its predictive capabilities. From the model, it was predicated that he side chain of Arg 1 of the ligand could interact with Asp 628 and Asp 712. The mutation of either of these receptors to alanine resulted in a loss in bradykinin binding affinity by
356 19- and 28-fold respectively. However, the model predicted an interaction between both residues. The double mutation of these residues resulted in a 500-fold reduction in bradykinin binding affinity thus confirming the involvement of both residues and the validity of the model. A further hypothesis suggested that antagonist peptides might bind in a similar fashion to bradykinin. A modified prototypical antagonist, containing an intact Cterminal p-turn structure with appropriate sidechains to retain primary electrostatic interactions, was shown to have a substantial pharmaceutical improvement in the sense that it is less peptidic. Therefore this strategy of identifying the ligand binding site and designing structurally appropriate ligands has, in this case, led to the possibility of synthesising more potent compounds.
Structural changes The binding of a ligand, be it an agonist or antagonist, results in conforrnational changes within the receptor. These structural changes involved in the signal transduction pathway have been studied, both experimentally and theoretically. In many GPCRs, the ligand initially has to pass through the extracellular domain and enter a hydrophobic pocket within the transmembrane domain. However, the majority of GPCRs have a tightly bound canopy, usually containing a disulphide bridge, making the entry of the ligand difficult. Therefore the extracellular loops must undergo some kind of structural change allowing the ligand to enter. A study by Kamiya and Reynolds g6 used Brownian dynamics on a model of the extracellular domain of the 132-adrenergic receptor. The simulation was run over 30 ns. This included 80 ps to allow the system to equilibrate and 30000 ps of data collection taken every 1 ps. It was estimated that the minimum area required for norepinephrine to pass through was 110 A 2, while the area for propranolol and the photoaffinity ligand, ICYP-da, was 140 and 170 A 2 respectively. However at no point during the simulation did the loops present an open area greater than 150 A 2 or remain open for longer than 2-3 ns. It was suggested that this area may be sufficient to allow small ligands to pass through into the transmembrane domain but insufficient to allow the larger ligands through. Therefore it was suggested that there must helical movement must accompany movement of the extracellular loops to allow the ligand to pass through. Spin labelling studies have suggested movements of the helical domain during activation s7-89. A number of computational studies have investigated these movements in an attempt to elucidate the structural changes brought about by ligand binding. Gouldson et a136 constructed a model of the 132-adrenergic receptor containing agonists, antagonists and partial agonists. Molecular dynamics simulations over a period of 500 ps were then carried out on each receptor with data collection every 50 ps. The major structural changes were found to occur in the intracellular half of helices 5 and 6. The agonist-induced structural changes observed to helices 5 and 6 were thought to be large enough to induce a conformational change in il3. An alternative hypothesis suggested that the change in the tilt of these helices might enhance the formation of a 5,6-dimer due to the formation of a more optimal helix packing at the dimer interface. Helices 5 and 6 in the apo receptor were found to be essentially perpendicular to the membrane. However, after the agonist induced structural changes, the tilt of the helices had changed by approximately 20 ~ Strahs et a143 reported similar results in a comparative analysis of the motion of the helices in the 8, K: and ~t opioid receptors. Molecular dynamic simulations were again used, in
357 this case over a period of 2 ns. Correlated motions between helices 5 and 7, 5 and 3 and 3 and 4 were seen. These motions indicated correlations between helices 5 and 6 and between the cytoplasmic ends of helices 3 and 5, which is in good agreement with the experimental evidence s7-89. Zhang et al also noted these helical motions in their model of a 5-HT2 receptor ~9, in which the helical domain is made up from a mismatch of helices as described previously. Molecular dynamics simulations of the receptorligand complexes containing agonists or antagonists revealed different structural changes in either complex. The binding of an agonist, 5HT, produced large structural changes in the intracellular domain of helices 5 and 6, which is consistent with other studies 42'68'8789. However, the binding of an antagonist, 5HGR (5-hydroxygramine), resulted in greater structural changes in helices 1-3 rather than 4-7, with the greatest structural changes on helix 2. A second agonist, tryptamine, was also docked. This ligand, which has a lower efficacy than 5-HT, produced similar but smaller structural changes when compared to the full agonist. It was therefore inferred ligands with different efficacies brings about different structural changes to the receptor, resulting in a different pharmacological response. Indeed, Gouldson et a135 noted that the partial agonist pindolol caused a change to helix 5 but essentially no change to helix 6; the change is therefore midway between that caused by an agonist and an antagonist. Fanelli and co-workers have produced a large number of studies on the activation of ~lB-adrenergic and m3-muscarinic receptors 51'52'74'75'9~ In early studies 74'75, the adrenergic and muscarinic receptor models were built using a modified model based on a 'bacteriorhodopsin-like' input structure 41, using the hamster and rat sequences respectively. A number of ligands, both agonists and antagonists, were docked using interactive molecular graphics with experimental data used as structural constraints and energy minimisation and molecular dynamics to bring about the resulting structural changes 74'75, as described previously. The apo structures was analysed to identify the hydrogen bonding interactions for the most conserved polar amino acids, which form a polar binding pocket. The resulting agonist/antagonist bound receptors were then reanalysed to identify any changes in this hydrogen-bonding network. The binding of agonists induced perturbations to the hydrogen-bonding network found in the apo receptor. This is because several of those conserved polar residues interact directly with the ligand. The rearrangement of the polar pocket was also caused by movement of helices 3-7. In the muscarinic receptor model, the movement of the helices was about 2 A, with respect to the apo protein. In both receptors, the structural rearrangements resulted in Arg 34~ of the DRY motif, which is crucial in G-protein activation, to move out of the polar pocket, possibly exposing this residue ready for interaction with the G-protein. The binding of antagonists was found to cause structural changes that differ from those caused by the agonists. The movement of the helices was mainly found in helices 1-3. Again in the muscarinic receptor model, this movement was about 2 ~ with respect to the apo protein. The agonist-induced movement is consistent with those changes identified in previous studies la'35'42'87-a9.The mobility of this region has recently been studied using site-directed spin-labelling studies 91 The mutation of a number of residues can result in the agonist independent activation of the receptor. These constitutively active mutants, a number of which are found in the C-terminal portion of il2, led to the hypothesis that GPCRs exist in equilibrium between two interconvertible allosteric states 92, R and R ~. Fanelli and coworkers recognised two drawbacks to the previous strategies used to identify agonist/antagonist induced structural changes 52'9~ The first drawback is that the
358 different states investigated are dependant on the position of the ligand in the receptor and the orientation of those residues that interact with the ligand. Secondly, this method does not investigate the mechanism of constitutive activation. In this new study, Fanelli and co-workers proposed a new method to study the structural and dynamic features associated with the active state of a GPCR and the transition of R to R* independent of agonist binding. All of the 19 possible substitutions of Ala 6~ in the O~lb-adrenergic receptor were found to have varying abilities to induce constitutive activation 93. Five receptor models were constructed, containing the wild type, two weakly (A604S and A604D) and two strongly (A604K and A604E) constitutively active mutants. Each model underwent 150 ps of molecular dynamics, with the last 100 ps used for data collection, with data taken every 0.5 ps. The 200 structures of each receptor were used to generate an average structure, which was then minimised. A comparison of the 5 minimised receptors revealed a number of structural differences between each receptor. The most noticeable structural change involved Arg 34~ In the wild type receptor, this residue is facing inward towards helices 1 and 2, forming a hydrogen bond with Asp 224. However, in the mutant receptors, the aspartate has been progressively rotated out of the polar pocket, consistent with previous observations 74'7s. In fact, the amount of the shift of Arg 34~ appears to be related to the ability of the mutant to induce constitutive activity. In the model, Ala604, which is in the o~ helical C-terminal domain of il3, faces towards the ot helical N-terminal domain of this loop. The substitution of Ala 6~ on helix 6 with any residue may result in the ability for the mutated residues to interact with Tyr 532 and Lys 536 on helix 5. This interaction may be sufficient to promote rigid body motion of helices 5 and 6 resulting in the rotation of Arg 34~ out of the polar pocket. This may confirm that the main role of Ala 6~ in the wild type receptor is to constrain the receptor in its inactive form and offers an explanation on how this is achieved. The protonation of Glu 339 in rhodopsin has been implicated 94 in causing the receptor transition from R to R '~. The corresponding residue in the otlB-adrenergic, 339 Asp , was protonated in the wild type receptor and underwent a molecular dynamics simulation following the protocol previously described. The resulting minimised average structure was then compared to the previous structures. The protonated wild type receptor was found to share a number of features only seen in the mutant receptors, crucially the rotation of Arg 34~ out of the polar pocket. The protonation of 339 339 410 Asp results in the breaking of the salt bridge between Asp and Arg , resulting in the rigid body motions previously described. Therefore, although activation is brought about in different fashions, the protonation of Asp 339 and the constitutively active mutations results in structures that share a common structure of their cytosolic domain and thus forming the structure of R *. However illuminating these studies are, it must be noted the simulations on the role of protonation in essentially gas phase receptor models must be interpreted with care. Verification of these results must wait until simulations in the presence of lipid and solvent are more c o m m o n - for an example of a simulation of a GPCR in a hydrated lipid bilayer, see reference (95). To confirm the validity and accuracy of this technique, two different mutations of Asp 339, D339A and D339N, were simulated. The mutation to asparagine perturbed the receptor, resulting in a shift of Arg 34~ out the conserved polar pocket. However, the resulting conformation of il2 was found to be different from the structures observed for the constitutively active mutants, A604E and A604K, and the protonated Asp 339. The mutation was therefore predicted to show a poor level of constitutive activity. The second mutation, D339A, resulted in a remarkably similar structure to
359 those observed for the A604E activating mutation, indicating that this mutation may show a high level of constitutive activation. These site directed mutations were then carried out experimentally. As predicted, the mutation, D339N, showed very little constitutive activity when compared to the wild type receptor. The D339A mutation, however showed a marked increase in constitutive activity, namely a greater than 500% increase in the IP concentration above the basal level. This increase was confirmed to be receptor mediated by its susceptibility to the antagonists prazosin and phentolamine, which resulted in receptor inhibition. Therefore, both of the predictions of the two mutant's ability to induce agonist-independent activation of the receptor were confirmed. The minimised average structures of the wild type alb-adrenergic receptor in both the inactive, R, and active, R '~, along with the minimised structures of the constitutively active mutants A293E and D 142A, are shown in reference (52). The role of Asn 13~ in stabilising the polar pocket, which is believed to be important in controlling the movement of Arg 34~ and thus restraining the receptor in an inactive conformation, has also been investigated. In this polar pocket, consisting of Asn 13~ Asp 224, Asn 729 and Tyr 733, hydrogen bonding between Asp 224 and both Asn 13~ and Asn 729 is the major contributor to the stability of the pocket. Other contributions come from the electrostatic interactions between Arg 34~ and both Asp 224 and Asn13~ In the protonated and mutant receptor models, Arg 34~is orientated outside of the polar pocket due to the loss of the hydrogen bonding network. Mutations of Asn 13~ to alanine, aspartate, leucine and phenylalanine were constructed using the average structure of the wild type receptor. The N130A mutation resulted in the movement of helix 2 away from helix 1 and orientating Arg 34~ outside of the polar pocket, which is caused mainly by the loss of the Asn 13~ - Asp 224 interaction. The N130D mutation increased the stability of the polar pocket, a s Arg 34~ now forms hydrogen bonds with Asp ~3~ Arg -6~ and Lys 6~ The remaining two mutations, N130L and N130F resulted in complex structural changes, which affected the interaction with helix 6 and its C-terminal helical extension. It is unclear if the two mutations, N130L and N130F, result in constitutive activation or in perturbing the activation. However, it is clear that the mutation N130D will not display constitutive activation but N130A will be constitutively active. The site directed mutagenesis of N 130A and N 130D revealed that N 130A was indeed constitutively active as the basal levels of intracellular IP3 were increased 2-fold whereas the N130D exhibited no constitutive activation, proving the predictive power of these computational methods. The agreement between the modelling and experimental studies such as this strongly supports the validity of molecular modelling in GPCRs. A recent review on the conformational changes during receptor activation by Hulme et a196 also concluded that the rotation and small translation of helix 3, which disrupted the interactions between helices 4, 5, 6 and 7, resulted receptor activation. It was therefore proposed that helix 3 acts as a rotational switch, which upon ligand binding undergoes a conformational change that is propagated throughout the receptor.
Receptor- G-protein interaction The final step of GPCR activation is the interaction and activation of an associated G-protein. With the lack of a model of an activated receptor to an atomic resolution, it is difficult to characterise fully the GPCR-G-protein interaction. However, a number of experimental and computational studies have attempted to identify specific residues and regions within the receptors and G-proteins involved in this interaction.
360 In early modelling work by Mahmoudian 97, the complex of the human Gsa subunit with the 133-adrenergic receptor was studied. The Gsc~ subunit sequence was aligned with the E. coli EF-tu nucleotide binding domain, ras-p21 and the human rod transducin. The human sequence of transducin was also added to balance this alignment. The crystal structure of the E. coli EF-tu nucleotide binding domain was used as the backbone for the Gs model, with its amino acid sequence being mutated to the Gs sequence. This procedure produced the core of the Gs protein to which a number of loops were added using the COMPOSER program 98'99. Regions where no loop structure could be added, such as the N-terminus, were modelled based on the predicted secondary structure. The final structure then underwent a molecular dynamics simulation with a weak harmonic constraint to remove any bad contacts within the structure. A second dynamic stimulation was carried out without any constraints to obtain a realistic structure. The receptor was constructed using homology modelling, based on the bacteriorhodopsin template, as described by Trumpp-Kallmeyer et a112 The receptor-G-protein complex was built using interactive molecular graphics based on two experimental constraints. Firstly, the Cterminus of G-proteins is involved in binding to the receptor 100. Secondly, il3 and the C-terminus of the receptor are also involved in binding 1~ It was found that there was only one possible way the receptor could bind with the G-protein and satisfy both these constraints and was energetically stable according to molecular dynamics and energy minimisation calculations. Investigation of the structure of the complex, revealed a possible explanation why the deletion of residues 222-229 and 258-270 in the loop region of the [32-adrenegic receptor, which corresponds to the N-terminal and C-terminal domains of il3, results in the reduction of the receptors ability to activate Gs mediated adenylyl cyclase 1~ A hydrogen bond was found to form between a highly conserved glutamine, Q390, in the G-protein subunit and His 6~ found in the C-terminal domain of the receptor. A second hydrogen bond was also identified between Q227 of the Gs alpha subunit, believed to be important in the activity I~ of G~, and E248, found in il3. The mutation C327R and C341G in the C-terminal domain of the [32-adrenergic receptor has been shown to reduce its ability to active adenylyl cyclase ~~ In the 133-adrenergic-Gs complex, C327 is located near to the receptor-Gprotein interface, which forms a hydrogen bond with Arg 6~ It is believed that the mutation C 3 2 7 R results in repulsion between this mutated amino acid and Arg 6~ thus changing the conformation of the activated receptor and inhibiting activation. The solution of the crystal structure of transducin and other G-proteins, bound with 13'/subunits, adenylyl cyclase or RGS proteins 1~176 has aided in the study of the receptor-G-protein interaction and largely removed the need to model the G-protein by homology. A recent study by Oliveira et al 1l0 produced a model for the interaction between receptors and G-proteins. The model is based on the idea that conserved regions on the GPCR interact with the conserved regions on the G-protein. Sequence analysis of the Class A receptors revealed that there is at least one conserved residue in each helix. From the alignment, Arg 34~ which is part of the DRY sequence motif and Tyr TM, which is part of the NPXXY sequence motif, have been found to be essentially 100% conserved and so are thought to have a functional role. This arginine was investigated computationally by Fanelli 74'75 and Scheer 9~ They described the movement of this residue out of the polar pocket upon receptor activation, thus implicating it in G-protein binding and activation. The role of Tyr TM is less clear as there appears to be contradicting evidence 11~ Sequence analysis by Oliveira et al 1(o on the Ga chains identified 12 residues which are fully conserved. However, the majority of these were found to be involved
361 in the GDP/GTP binding site. A correlated mutational analysis of these sequences identified residue groups that had remained conserved or mutated as a group. These groups included residues in the [32/133 loop and in helices c~5 and aN. Identifying these residues in the crystal structure showed that they clustered around a conserved negatively charged aspartate at position 337. This residue is also in a position where it can interact with receptor. It was therefore suggested that this conserved negative charge and the surrounding residues form the binding site for Arg 34~ of the receptor. Additional evidence for this region's involvement came from a number of deletion experiments. These revealed that the positions 337-340 are involved in the release of GDP as the deletion of this region, but not 341-350 region, resulted in GDP release. Oliveira et a112 suggested that it is the perturbation of the arginine binding pocket, either by the binding of Arg 34~ in receptor association or deletion of this region that causes the GDP release. This perturbation leads to the destabilisation of the c~5/136 loop and the cz5 helix. Oliveira et a1110 also assumed that the C-terminal of the receptor interacts with the c~N helix and possibly with the 10 C-terminal residues of the G-protein, however, the structure of the final 10 residues is unknown as they are not visible in electron density maps. A number of structures for these residues have been proposed. However, Oliveira et a121~ suggested that the C-terminal becomes ordered during receptor-G-protein coupling. This was found to be in good agreement with their model, as a helix-helix interaction could arise between a number of hydrophobic residues without specific residue-residue interactions 9 From their model, it was established that this interaction results in the movement of the region, following the helix in the receptor's C-terminal, away from the complex making it more accessible to other proteins such as kinases. This is in good agreement with studies by Palczewski et a1111 Fanelli et a151 continued on from their work on how constitutive active mutants induce the C~lb-adrenergic receptor to adopt an active structure by studying the shape and electrostatic complimentarity when compared to the G-protein heterotrimer, Gaql31~,2.The comparison of the cytosolic side of the constitutively active mutant receptors and agonist-induced receptors shows that they share a common opening between ilI and il2 and also between il3, resulting in a large solvent accessible surface area. To identify whether the exposure is important in receptor-Gprotein recognition, the wild type, the D 142A mutant and the agonist bound receptors were docked as rigid bodies with the ac~q~172 heterotrimer using the ESCHER program 222 . The docked structure of the wild type receptor with the G-protein resulted in an unlikely structure for this complex, mainly due to the burying of ill by the remaining intracellular loops. However, a number of structures were obtained for the active receptor structures indicating that the opening of the cytosolic domain may be a crucial step in receptor-G-protein recognition. These receptor-G-protein complexes were analysed in terms of an electrostatic driving force rather than an intermolecular driving force as no conformational changes are allowed due to protein-protein interactions. The best structures obtained were for the D142A mutant receptor-Gprotein complex, as this shows the best complimentarity. These structures allow the positively charged side of the G-protein N-terminal a-helix to face the negative surface of the membrane. Also in this orientation, the C-terminal of the receptor, which isn't included in these models, may be able to interact with the 13-subunit and the N-terminal of the cz-subunit. In this orientation, the solvent exposed cytosolic
362 domain could dock in two manners. Firstly, il2 could interact with the cz4/136 loop and the C-terminal of the cz5 helix, residues 252-258 of il3 could interact with the Nterminal of the cz3 helix while the extension of helix 5 and the C-terminal of il3 could interact with the ct4/136 loop. Secondly, il2 could interact with the cz4/p6 loop and the C-terminal of the c~5 while the residues 252-258 of il3 could interact with c~G and the N-terminal and middle portion of the c~5 helix. The first orientation was found to be more suitable as significantly more docked structures identified this orientation and it also allowed the positive electrostatic potential surface of the cytosolic domain to compliment the negatively charged surface of the Ras-like domain. Therefore, this work indicated the importance of residues 252-259 of il3 to receptor-G-protein recognition and activation. The work also indicated the importance of the opening of the cytosolic domain between il2 and il3, allowing the interaction between the receptor and important contact domains on the G-protein, namely the C-terminal and o~3, c~5, the loops, (x4/136 and c~G/134. This is in agreement with Oliveira et al ll~ who postulate that perturbation, possibly caused by receptor association, of the o~5 helix and the loop c~5/[36, which is connected to the o~4/136, may result in the release of GDP, thus resulting in G-protein activation. Sequence analysis has become a powerful tool in identifying functional domains in proteins. One such tool is the Evolutionary Trace (ET) method as developed and implemented by Lichtarge et a/113115. One such study was on the binding surfaces of the G-protein c~ subunits ll4. The available 112 sequences were aligned and a dendogram was generated. At each partition in the tree, a new grouping was generated and within each group the conserved residues were identified. Initially the percentage identity cut-off (PIC) value is 0, where all sequences are analysed before any partition. As you move along the tree, there is an increase in the number of partitions and therefore an increase in the PIC value, which is related to the percentage moved along the tree. However each partition will contain fewer sequences. Therefore residues identified at PIC 0, are fully conserved throughout the entire family and will have the same function throughout the entire family whereas residues identified at higher PIC values are conserved in their subfamily (i.e. conserved in class) will have a function specific to their family. The analysis of the results were mapped onto a van der Waals surface of the c~ subunit solved by Lambright et al 1~ Two surface clusters were identified that are on opposite sides of the subunit. The first cluster of 17 residues was found around the Cterminal, suggesting a possible contact region for the receptor. Many of the residues identified have also been implicated by experimental studies 116-119. The second cluster of 32 residues, stretching from the membrane face to the nucleotide-binding cleft, contained the 3 switch regions. This region was identified as the binding region of G~v when compared to the crystal structure ~~ The analysis of G~ revealed two clusters, matching the contact surface of G~ and Gy, confirming the ability of the ET method to identify functional domains and protein-protein interfaces. On the basis of these results combined with experimental data, a model of the receptor-G-protein complex was constructed. The receptor was orientated so that il3 and il4, formed by palmitoylation of two cysteines in the C-terminal, were in contact distance of G~y. This left the first cluster as the primary interaction site for ill and il2. The analysis of these results in the light of recent work on GPCR dimerisation has led to some interesting results (see below).
363
GPCR dimerisation Recently, dimerisation has been reported in a number of Class A 120-125, Class g 126 and Class C 127-131 G-protein coupled receptors, suggesting that dimerisation is important in the function of these receptors. Gouldson et a148 proposed a mechanism of receptor activation involving domain swapping. Perhaps the most interesting results were obtained by Maggio et a1123 on chimeric receptors, muscarinic M3 and adrenergic or2. The chimeric receptor containing the N-termini through to il3 from the muscarinic receptor and the remainder from the adrenergic receptor did not bind ligand nor activate the G-protein and neither did the alternative chimera. However, coexpression of both chimeras resulted in both ligand binding and activation. Based on these results, both correlated mutational analysis and molecular dynamics simulations were used to analyse the proposed mechanism of dimer formation 48. A receptor model of the [32-adrenergic receptor was constructed based on the projection structure published by Unger et a127. Three possible dimer arrangements were made from the [32-adrenergic receptor, a 1,2-dimer, a 1,7-dimer and a 5,6domain swapped dimer. The numbers (1,2 1,7 and 5,6) denote the helices interacting at the dimer interface. A single ligand was docked into one half of the receptor dimer and the complexes under went energy minimisation and a molecular dynamics simulation of up to 450 ps. The potential energy plots of these complexes (with simulation time) revealed that both the apo 1,2 and 1,7 dimers were significantly lower in energy, which was also the case when an antagonist was present. However, when an agonist was docked, the energy of the 5,6-dimer was significantly lowered relative to the other structures. This is thought to be consistent with the idea that agonist-induced activation is caused by a shift in the equilibrium towards the 5,6dimer. Here it should be noted that the simulations are performed on a dimer model in the absence of loops. Consequently, the 5,6-contact dimer and the 5,6-domain swapped dimer are identical and so neither the molecular dynamics simulations, the CMA nor the ET results can distinguish between them. The arrangement of the helices in a 5,6-contact dimer and the 5,6-domain swapped dimer are given in figure 8.
Figure 8. The possible arrangement of helices in a 5,6-domain swapped dimer (left) and a 5,6 contact dimer (right). The positions of the helices are identical in each arrangement,
A number of residues in the putative 5,6 dimer interface have been identified by site directed mutagenesis. The mutation of these residues was simulated using free energy simualtions 132, in order to assess whether the model was consistent with these findings 48. The mutations, G276A, G280A and L284A were carried out using a windowing approach containing 21 windows, resulting in 5% of the mutation in each window. The Y209A mutation was difficult as there is a significant reduction in the size of the residue as the simulation proceeds. Therefore, the mutation was carried out in three stages: (i) Tyr---~Phe, (ii)
Phe--+Ala* and (iii) Ala*---~Ala where Ala* is an alanine residue with a large C~, resulting in essentially no volume change in the
364 Phe---~Ala* mutation. The G276A, G280A, L284A and Y209A mutations all resulted in a positive free energy change. These results are consistent with the idea that these mutations destabilise the dimer interface, resulting in mutations that inhibit G-protein activation due to the inability of the receptor to form dimers 48. CMA analysis on a number of Class A sequences revealed a number of internal residues, which relate to the ligand binding domain in these receptors. However, a number of residues identified by CMA were found to be on the external face of the receptor. Many of which were found to be on helices 5 and 6. Pazos et a1133 suggested that correlated residues are involved in protein-protein interfaces, supporting the idea of a protein-protein interface involving these helices. The remaining external correlated residues on helices 1, 2 and 7 were thought to be involved in the formation of a 1,7 dimer intermediate or in the formation of higher order oligomers 48. The formation of receptor dimers have also been investigated by Fanelli et al 5~. Rigid body docking was used to simulate the dimerisation of their models. The best resulting dimer structures for the D 142A mutant and epinephrine-bound receptors were found to be those involving helices 5, 6 and 7, which is largely consistent with the finding of Gouldson et a148. A number of residues were identified as possibly being involved in receptor dimerisation, including several identified previously by Gouldson. As a result, it was suggested that dimerisation occurs through intermolecular interactions between hydrophobic residues. We have applied the ET method to the GPCR superfamily and reapplied it to the G-proteins 46'134. The analysis not only identified residues involved in ligand binding and helix-helix packing but also identified a number of external residues, in a similar fashion to the correlated mutation analysis. Clear external functional sites were identified on helices 5 and 6 and also on 2 and 3, shown in figure 9, with comparably few residues identified on the external face of helices I, 4 and 7. Again, the analysis has highlighted a possible role of helices 5 and 6 in dimerisation but the identification of an interaction site on helices 2 and 3 raises the question as to the possible role of this second site. It is possible that this second site may be involved in the formation of 2,3 dimers, however this type of dimer has very little support in the literature. If the 2,3 dimer is possible, then dimerisation can occur at either end of the receptor, possibly resulting in trimers or linear oligomers. However, a more likely explanation of these functional sites on helices 2 and 3 is that they are involved in interactions with other proteins such as RAMPs. Figure 9. The residues identified by the Evolutionary Trace (ET) method on the external face of the receptor. In each orientation the helix/helices in focus are coloured mid grey while the ET residues in focus are coloured light grey. The remaining residues are dark grey. a) helices 5 and 6, b) helix 4, c) helices 2 and 3 d) helix7. There are clear relationships between CMA and the ET analysis. However, the ET method appears to give clearer results. Consequently, the ET method was used to
365
analyse the G-protein family in the light of the growing evidence for dimerisation. A comparison between the original results derived by Lichtarge and our results revealed a GPCR binding site about twice the size previously identified and for the 133'binding, we also found a larger site. However, the crystal structures of Ga with adenylyl cyclase 1~ and RGS41~ revealed that the additional ET residues on the 137 binding site are associated with the binding of these proteins. The electrostatic potential plots, calculated using the Poisson-Boltzmann method 135 and displayed using GopenMo1136, of several G-proteins also revealed the possibility of dimers interacting with the Gprotein as two negatively charged potential wells can be clearly identified on the surface of the G-protein. The second GPCR binding site has not been identified in previous modelling studies. The position of the N-terminal in the chimeric c~-subunit, positions all but the Ras domain away from the membrane, resulting in the second possible interaction site being to far away from the receptor loops. However, examination of the structural changes brought about due to RGS binding revealed a significant change in the orientation of the N-terminal, possibly causing not only the Ras-like domain to face the membrane, but the entire face of the G-protein, as shown in figure 10. The two sets of ET results are shown in Figure 11A. The residues identified in both analyses are shown in dark grey while the new results are shown in light grey. It is difficult to see how this second binding site can interact with the receptor since the require-ment for the N-terminus to be attached to the membrane 137-139 ensures that this site is too far from the membrane, possibly explaining why Lichtarge et a1114 did not report these residues in their early work. However, the structure of transducin binding RGS41~ shows a conformational change the re-orientates this key binding site with reference to the N-terminus. An additional effect of this transition is that the [37 subunit would not be held so tightly. We are therefore Figure 10. The structural changes in the Nable to tentatively propose a terminus of the G-protein o~-subunit. The c~-subunit mechanism of G-protein in (A) its inactive form when bound to the [3'/ activation that involves an dimer, (B) its active form when bound to RGS4. (C) A schematic diagram to illustrate the possible role of this movement in activation
366 electrostatic attraction between the G-protein and the receptor dimer that is sufficiently strong to cause the conformational change seen in the transducin-RGS complex and hence initiate the signalling process by the release of 137. This mechanism is shown schematically in figure 11. Recently however, its was shown that an antibody directed against residues 100-119 in the ot-subunit of Gs mimicked the effects of receptor binding as this subunit was found to stimulate adenylyl cyclase 14~ Those residues identified experimentally to be involved in receptor binding as also shown in figure 11. However, it must be stressed that the evidence that GPCR dimers play a role in G-protein activation is not definitive and it may be necessary to look for some other functional role for GPCR dimers.
Figure 11. The ET results (A) and residues implicated in receptor binding from experimental data (B) for the G-protein. A) The residues identified in both analyses are shown in dark grey while the residue identified in the new analysis are shown in light grey. B) Residues identified in receptor binding ~4~ are shown in light grey.
Conclusions A great many studies, both experimental and theoretical, have been carried out on the GPCR protein family and much progress has been made. Since the early 90's, the projection maps of Unger et al, Schertler et al and Krebs et al have increased the accuracy of theoretical models of GPCRs, previously based on the structure of bacteriorhodopsin. This increase in the accuracy of these models, coupled with the increase in computational power, have allowed more detailed theoretical studies on agonist and antagonist binding and the resulting structural changes. Indeed, many of these studies have revealed different binding modes of antagonists compared to agonists, resulting in substantially different structural changes of the receptor, as seen in the work by Gouldson el a135 and Zhang el a119. Indeed, many of the residues identified to be involved in ligand binding through modelling have been confirmed to be involved in ligand binding experimentally by site directed mutagenesis, with much of this experimental data accessible through the world wide web at the GRAP and tinyGRAP databases. Modelling has also allowed a great deal of experimental observations to be explained on a molecular level. Fanelli e t a/5152'74'75'90 produced a number of theoretical studies on the effect of constitutively active mutants. Molecular dynamic simulations on mutant receptors revealed that these mutants caused a
367 rotation of helix 3, disrupting a hydrogen bond network involving this receptor. Not only did these simulations offer an explanation of the mechanism of constitutive activation but also was in agreement with a number of spin labelling studies that suggested movement of this helix was a crucial step in receptor and G-protein activation. In the absence of a crystal structure, sequence data has become increasingly used in predicting the function of specific residues. The multiple sequence alignment of these proteins, available from the GPCRDB maintained by G. Vriend, combined with tools such as correlated mutational analysis and the evolutionary trace method has revealed the role of a large number of residues in ligand binding, receptor activation and protein-protein interactions. Indeed, both CMA and ET have revealed residues on the external face of the receptor and the existence of two possible receptor binding regions of the G-protein, which point to a possible mechanism of receptor activation via dimerisation, which is being supported by an ever-increasing amount of experimental data. Overall, it can be seen that the modelling of these receptors is a complex challenge. However, many of the studies reviewed here have been able to explain a large amount of experimental data, using solely theoretical models and sequence analysis techniques. In addition to this, much of this theoretical work has become a predictive tool, offering possible mechanisms involved in ligand binding, receptor and G-protein activation. The modelling of these proteins has become an integral part in the study of these proteins and should continue to provide further explanations to future experimental observations.
368
References
.
10. 11.
12.
13. 14.
15. 16. 17.
18.
Wess, J. Molecular basis of receptor/G-protein-coupling selectivity. Pharmacol. Ther., 1998, 80, 231-264. Watson, S., Arkinstall, S. The G-protein linked receptor facts book, Academic Press, London, 1994. Gudermann, T., Nurnberg, B., Schultz, G. Receptors and G-proteins as primary components of transmembrane signal transduction 1: G-proteincoupled receptors - structure and function. J. Mol. Med. 1995, 73, 51-63. Clapham, D. E., Neer, E. J. New roles for G-protein 137-dimers in transmembrane signalling. Nature, 1993, 365,403-406. Belrhali, H., Nollert, P., Royant, A., Menzel, C., Rosenbusch. J. P., Landau, E. M., Pebay-Peyroula, E. Protein, lipid and water organisation in Bacteriorhodopsin: A molecular view of the purple membrane at 1.9 [5 resolution. Structure with folding and design, 1999, 7, 909-917. Grisshammer, R., Tate, C. G. Overexpression of integral membrane proteins for structural studies. Q. Rev. Biophys., 1995, 28, 315-422. Kuhlbrandt, W. Two-dimensional crystallization of membrane proteins. Q. Rev. Biophys., 1992, 25, 1-49. Henderson, R., Baldwin, J. M., Ceska, T. A., Zemlin, F., Beckmann, E., Downing, K. H. Model for the structure of bacteriorhodopsin based on highresolution electron cryo-microscopy. J. Mol. Biol., 1990, 213,899-929. Nordvall, G., Hacksell, U. Binding-site modelling of the Muscarinic ml receptor: A combination of homology-based and indirect approaches. J. Med. Chem., 1993, 36, 967-976. Cronet, P., Sander, C., Vriend, G. Modelling of transmembrane seven helix bundles. Protein Engineering, 1993, 6, 59-64. Yamamoto, Y., Kamiya, K., Terao, S. Modelling of human Thromboxane A2 receptor and analysis of the receptor-ligand interaction. J. Med. Chem., 1993, 36, 820-825. Trumpp-Kallmeyer, S., Hoflack, J., Bruinvels, A., Hibert, M. Modelling of Gprotein-coupled receptors: Applications to Dopamine, Adrenaline, Serotonin, Acetylcholine and Mammalian Opsin receptors. J. Med. Chem., 1992, 35, 3448-3462. Livingstone, C. D., Strange, P. G., Naylor, L. H. Molecular modelling of D2like dopamine receptors. Biochem. J., 1992, 287, 277-282. Oliveira, L., Paiva, A. C. M., Vriend, G. A common motif in G-proteincoupled seven transmembrane helix receptors. J. Computer-Aided Mol. Design, 1993, 7, 649-658. Mahmoudian, M. The cannabinoid receptor: Computer-aided molecular modelling and docking ofligand. J. Mol. Graphics Mod., 1997, 15, 149-153. Dahl, S. G., Edvardsen, O., Sylte, I. Molecular dynamics of dopamine at the D2 receptor. Proc. Natl. Acad. Sci., 1991, 88, 8111-8115. Deisenhofer, J., Epp, O., Miki, K., Huber, R., Michel, H. Structure of the protein subunits in the photosynthetic reaction centre of Rhodopseudomonas viridis at 3 ,~ resolution. Nature, 1985, 318, 618-624. Maloney-Huss, K., Lybrand, T. P. Three-dimensional structure for the [32 Adrenergic receptor protein based on computer modelling studies. J. Mol. Biol., 1992, 225,859-871.
369 19.
20.
21.
22. 23.
24. 25.
26.
27. 28. 29.
30. 31.
32.
33.
34.
35.
Zhang, D., Weinstein, H. Signal transduction by a 5-HT2 receptor: A mechanistic hypothesis from molecular dynamics simulations of the threedimensional model of the receptor complexed to ligands. J. Med. Chem., 1993, 36, 934-938. Herzyk, P., Hubbard, R. E. Automated method for modelling seven-helix transmembrane receptors from experimental data. Biophys. J., 1995, 69, 24192442. Peitsch, M. C., Herzyk, P., Wells, T. N. C., Hubbard, R. E. Automated modelling of the transmembrane region of G-protein coupled receptor by Swiss-Model. Receptors and Channels, 1996, 4, 161-164. http://www, expasy, ch/swi ssmo d/S WIS S-M O D EL. html Prusis, P., Schi6th, H. B., Muceniece, R., Herzyk, P., Afshar, M. Hubbard, R. E., Wikberg, J. E. S. Modelling of the three-dimensional structure of the human melanocortin 1 receptor, using an automated method and docking of a rigid cyclic melanocyte-stimulating hormone core peptide. J. Mol. Graphics Mod., 1998, 15,307-317. Schertler, G. F. X., Villa, C., Henderson, R. Projection structure of rhodopsin. Nature, 1993, 362, 770-772. Henderson, R., Baldwin, J. M., Ceska, T. A., Zemlin, F, Beckmann, E., Downing, K. H. Model for the structure of bacteriorhodopsin based on high resolution cryomicroscopy. J. Mol. Biol., 1990, 213,899-929. Unwin, P. N., Henderson, R. Molecular structure determination by electron microscopy of unstained crystalline specimens. J. Mol. Biol., 1975, 94, 425440. Unger, V. M., Schertler, G. F. X. Low resolution structure of bovine rhodopsin determined by electron cryo-microscopy. Biophys. J., 1995, 68, 1776-1786. Baldwin, J. M. The probable arrangement of the helices in G-protein-coupled receptors. EMBOJ., 1993, 12, 1693-1703. Gouldson, P. R., Bywater, R. P., Reynolds, C. A., Correlated mutations and subtype specificity in the adrenergic receptor, Biochem. Soc. Trans., 1997, 25, 434S. Ferenczy, G. G., Winn, P. J., Reynolds, C. A. Towards improved force fields. II. Effective distributed multipoles. J. Phys. Chem. A., 1997, 101, 5446-5455. Liu, J., Sch6neberg, T., Van Rhee, M., Wess, J. Mutational analysis of the relative orientation of transmembrane helix 1 and helix 7 in G-protein coupled receptors. J. Biol. Chem., 1997, 270, 19532-19539. Mizobe, T., Maze, M., Lam, V., Suryanarayana, S., Kobilka, B. K. Arrangement of transmembrane domains in adrenergic receptors: Similarity to bacteriorhodopsin. J. Biol. Chem., 1996, 271, 2387-2389. Elling, C. E., Schwartz, T. W. Connectivity and orientation of the seven helical bundle in the tachykinin NK-1 receptor probed by zinc site engineering. EMBO J., 1996, 15, 6213-6219. Fu, D. Y., Ballesteros, J. A., Weinstein, H., Chen. J. Y., Javitch, J. A. Residues in the seventh membrane-spanning segment of the dopamine D2 receptor accessible in the binding-site crevice. Biochemsitry, 1996, 35, 11278-11285. Gouldson, P. R. Snell, C. R., Reynolds, C. A. A new approach to docking in the J32-adrenergic receptor that exploits the domain structure of G-protein coupled receptors. J. Med. Chem., 1997, 40, 3871-3886.
370 36.
37.
38.
39.
40.
41.
42.
43.
44. 45.
46.
47.
48.
49.
50.
Konvicka, K., Guarnieri, F., Ballesteros, J. A., Weinstein, H. A proposed structure for transmembrane segment 7 of G protein-coupled receptors incorporating an Asn-Pro/Asp-Pro motif. Biophys. J., 1998, 75, 601-611. Cai, K., Klein-Seetharaman, J., Farrens, D., Zhang, C., Altenbach, C., Hubbell, W. L., Khorana, H. G. Single cysteine substitution mutants at amino acid positions 306-321 in rhodopsin, the sequence between the cytoplasmic end of helix VII and the palmitoylation sites: Sulfhydryl reactivity and transducin activation reveal a tertiary structure. Biochemistry, 1999, 38, 79257930. Altenbach, C., Cai, K., Khorana, H. G., Hubbell, W. L. Structural features and light-dependent changes in the sequence 306-322 extending from helix VII to the palmitoylation sites in rhodopsin: A site-directed spin-labelling study. Biochemistry, 1999, 38, 7931-7937. Yeagle, P. L., Alderfer, J. L., Albert, A. D. Structure determination of the fourth cytoplasmic loop and carboxyl terminal domain of bovine rhodopsin. Mol. Vis., 1996, 2, 12. Donnelly, D., Findlay, J. B. C., Blundell, T. L. The evolution and structure of aminergic G-protein coupled receptors. Receptors and Channels, 1994, 2, 6178. Fanelli, F., Menziani, M. C., Cocchi, M., De Benedetti, P. G. Comparative molecular dynamics study of the seven-helix bundle arrangement of G-protein coupled receptors. J. Mol. Struc. (Theochem), 1995, 333, 49-69. Strahs, D., Weinstein, H. Comparative modelling and molecular dynamics studies of the 8, n and p opioid receptors. Protein Engineering, 1997, 9, 10191038. Bramblett, R. D., Panu, A. M., Ballesteros, J. A., Reggio, P. H. Construction of a 3D model of the cannabinoid CB1 receptor: Determination of helix ends and helix orientation. Life Sciences, 1995, 56, 1971-1982. Unger, V.M., Hargrave, P. A., Baldwin, J. M., Schertler, G. F. X. Arrangement of rhodopsin transmembrane or-helices. Nature, 1997, 389, 203-206. Baldwin, J. M., Schertler, G. F. X., Unger, V. M. An alpha-carbon template for the transmembrane helices in the rhodopsin family of G-protein coupled receptors. J. Mol. Biol., 1997, 272, 144-164. Gkoutos, G. V., Higgs, C., Bywater, R. P., Gouldson, P. R., Reynolds, C. A. Evidence for dimerisation in the [32-adrenergic receptor from the evolutionary trace method. Int. J. Quant. Chem., 1999, 74, 371-379. Donnelly, D., Maudsley, S., Gent, J. P., Moser, R. N., Hurrell, C. R., Findlay, J. B. C. Conserved polar residues in the transmembrane domain of the human tachykinin NK2 receptor: Functional roles and structural implications. Biochem. J., 1999, 339, 55-61. Gouldson, P. R., Snell, C. R., Bywater, R. P., Higgs, C., Reynolds, C. A. Domain swapping in G-protein coupled receptor dimers. Protein Engineering, 1998, 11, 1181-1193. Song, Z. H. Slowey, C. A., Hurst, D. P., Reggio, P. H. The difference between the CB~ and CB2 cannabinoid receptors at position 5.46 is crucial for the selectivity of WIN55212-2 for CB2. Mol. Pharmacol., 1999, 56, 834-840. Tao, Q., McAllister, S. D., Andreassi, J., Nowell, K. W., Cabral, G. A., Hurst, D. P., Bachtel, K., Ekman, M. C., Reggio, P. H., Abood, M. E. Role of a conserved lysine residue in the peripheral cannabinoid receptor (CB2): Evidence for subtype specificity. Mol. Pharmacol., 1999, 55,605-613
371 51.
52.
53.
54.
55. 56. 57.
58. 59.
60.
61. 62.
63.
64.
65.
66.
67.
Fanelli, F., Menziani, C., Scheer, A., Cotecchia, S., De Benedetti, P. G. Theoretical study on receptor-G-protein recognition: New insights into the mechanism of the ~lb-adrenergic receptor activation. Int. J. Quant. Chem., 1999, 73, 71-83. Fanelli, F., Menziani, C., Scheer, A., Cotecchia, S., De Benedetti, P. G. Ab Initio modelling and molecular dynamics simulation of the C~lb-adrenergic receptor activation. Methods." A companion to methods in enzymology, 1998, 14, 302-317. Horn, F., Weare, J., Beukers, M. W., Horsch, S., Bairoch, A., Chen, W., Edvardsen, O., Campagne, F., Vriend, G. GPCRDB: an information system for G protein-coupled receptors. Nucleic Acids Research, 1998, 26, 275-279. GPCRDB: Information system for G-protein coupled receptors (GPCRs) European Molecular Biology Laboratory, Heidelberg, Germany (http://www. sander, emb 1-heide lber g. de/7tm/). Schertler, G. F. X. Structure ofrhodopsin. Eye, 1998, 12, 504-510. Chothia, C., Levitt, M., Richardson, D. Helix to helix packing in proteins. J. Mol. Biol., 1981, 145, 215-250. Krebs, A., Villa, C., Edwards, P. C., Schertler, G. F. X. Characterisation of an improved two-dimensional p 22121crystal from bovine rhodopsin. J. Mol. Biol., 1998, 282, 991-1003. Findlay, J. B. C., Pappin, D. J. C. The Opsin family of proteins. Biochem. J., 1986, 238,625-642. Pellegrini, M. and Mierke, D. F. Molecular complex of cholecystokinin-8 and N-terminus of the cholecystokinin A receptor by NMR spectroscopy. Biochemistry, 1999, 38, 14775-14783. Mizoue, L. S., Bazan, J. F., Johnson, E. C., Handel, T. M. Solution structure and dynamics of the CX3C chemokine domain of fractalkine and its interaction with an N-terminal fragment of CX3CR1. Biochemistry, 1999, 38, 1402-1414. Yeagle, P. L., Alderfer, J. L., Albert, A. D. Structure of the third cytoplasmic loop of bovine rhodopsin. Biochemistry, 1995, 34, 14621-14625. Yeagle, P. L., Alderfer, J. L., Salloum, A. C., Ali, L., Albert, A. D. The first and second cytoplasmic loops of the G protein receptor, rhodopsin, independently form beta-turns, Biochemistry, 1997, 36, 3864-3869. Yeagle, P. L., Alderfer, J. L., Albert, A. D. Three-dimensional structure of the cytoplasmic face of the G protein receptor rhodopsin. Biochemistry, 1997, 36, 9649-9654. Bouvier, M., Moffett, S., Loisel, T. P., Mouillac, B., Hebert, T., Chidac, P. Palmitoylation of G-protein coupled receptors: A dynamic modification with functional consequences. Bio. Chem. Soc. Trans., 1995, 23, 116-120. Kyle, D. J., Chakravarty, S., Sinsko, J. A., Stormann, T. M. A proposed model of bradykinin to the rat B2 receptor and its utility for drug design. J. Med. Chem., 1994, 37, 1347-1354. Kristiansen, K., Dahl, S. G., Edvardsen, O. A database of mutants and effects of site-directed mutagenesis experiments on G-protein coupled receptors. Proteins. Struct. Funct. Genet., 1996, 26, 81-94. Edvardsen, O., Kristiansen, K. Computerisation of mutant data: The tinyGRAP mutant database. 7TMjournal, 1997, 6, 1-6.
372 68.
69.
70. 71.
72.
73.
74.
75.
76. 77.
78.
79
80.
81.
82.
Gouldson, P. R., Winn, P. J., Reynolds, C. A. A molecular dynamics approach to receptor mapping: Application to the 5HT3 and [32-adrenergic receptors. J. Med. Chem., 1995, 38, 4080-4086. Goodford, P. J. A computational procedure for determining energetically favourable binding sites on biologically important macromolecules. J. Med. Chem., 1985, 28, 849-857. Maggio, R., Vogel, Z., Wess, J. Reconstitution of functional muscarinic receptors by coexpression of amino-terminal and carboxyl-terminal receptor fragments. FEBS Lett., 1993, 319, 195-200. Kobilka, B. K., Kobilka, T. S., Daniel, K., Regan, J. W., Caron, M. G., Lefkowitz, R. J. Chimeric alpha-2-adrenergic, beta-2-adrenergic receptorsdelineation of domains involved in effector coupling and ligand-binding specificity. Science, 1988, 240, 1310-1316. Maggio, R., Vogel, Z., Wess, J. Coexpression studies with mutant muscarinic adrenergic-receptors provide evidence for intermolecular cross-talk between G-protein-linked receptors. Proc. Natl. Acad. Sci. USA, 1993, 90, 3103-3107. Almaula, N., Ebersole, B. J., Zhang, D., Weinstein, H., Sealfon, S. C. Mapping the binding site of the serotonin 5-Hydroxytryptamine2A receptor. J. Biol. Chem., 1996, 271, 14672-14676. Fanelli, F., Menziani, M. C., De Benedetti, P. G. Computer simulations of signal transduction mechanism in C~l~-adrenergic and m3-muscarinic receptors. Protein Engineering, 1995, 8, 557-564. Fanelli, F., Menziani, M. C., De Benedetti, P. G. Molecular dynamics simulations of m3-muscarinic receptor activation and QSAR analysis. Bioorg. Med. Chem., 1995, 3, 1465-1477. Goodsell, D. S., Olson, A. J. Automated docking of substrates to proteins by simulated annealing. Proteins. Struct. Funct. Genet., 1990, 8, 195-202. Thomas, B. F., Compton, D. R., Martin, B. R., Semus, S. F. Modelling the cannabinoid receptor: A three-dimensional quantitative structure-activity analysis. Mol. Pharmacol., 1991, 40, 656-665. Shire, D., Calandra, B., Kirneis, A., Delpech, M., Barth, F., Rinaldi-Carmona, M., LeFur, G., Ferrara, P. Studies of the binding sites of the CB2 specific antagonist SR144528 and of the agonist WIN55212-2, Symposium on the cannabinoids, Burlington, VT, International Cannabinoid Research Society, 1997, p50. Reggio, P. H. Ligand-ligand and ligand-receptor approaches to modelling the cannabinoid CB1 andCB2 receptors: Achievement and challenges. Curr. Med. Chem., 1999, 6, 665-683. Mouillac, B., Chini, B., Balestre, M. N., Elands, J., Trumpp-Kallmeyer, S., Hoflack, J. M., Hibert, M., Jard, S., Barberis, C. The binding site of Neuropeptide vasopressin Via receptor: Evidence for a major localization within transmembrane region. J. Biol. Chem., 1995, 270, 27771-25777. Kojro, E., Eich, P., Gimpl, G., Fahrenholz, F. Direct identification of an extracellular agonist binding site in the renal V2 Vasopressin receptor. Biochemistry, 1993, 32, 13537-13544. Phalipou, S., Seyer, R., Cotte, N., Brenton, C., Barberis, C., Hibert, M., Mouillac, B. Docking of linear peptide antagonists into the human Via vasopressin receptor: Identification of binding domains by photoaffinity labelling. J. Biol. Chem., 1999, 274, 23316-23327.
373 83.
84. 85.
86.
87.
88.
89.
90.
91.
92.
93.
94.
95.
96.
97.
Saebo, S., Keene, E., Fang, T., Lynn, B. C., Hicks, R. P. Modelling of the recognition site of the NK1 receptor. J. Mol. Struc. (Theochem), 1996, 366, 6577. Insight II, Biosym Technologies, 9685 Scranton Road, San Diego, CA, 92121, USA. Huang, R. R. C., Yu, H., Strader, C. D., Fong, T. M. Interaction of Substance P with the 2nd and 7 th transmembrane domains of the neurokinin-1 receptor. Biochemistry, 1994, 33, 3007-3013. Kamiya, Y., Reynolds, C. A. Brownian dynamics simulations of the 132adrenergic receptor extracellular loops: Evidence for helix movement in ligand binding? J. Mol. Struc. (Theochem), 1999, 469, 229-232. Altenbach, C., Yang, K., Farrens, D. L., Farahbakhsh, Z. T., Khorana, H. G., Hubbell, W. L. Structural features and light-dependent changes in the cytoplasmic interhelical E-F loop region of rhodopsin: A site-directed spinlabelling study. Biochemistry, 1996, 35, 12470-12478. Yang, K., Farrens, D. L., Altenbach, C., Farahbakhsh, Z. T., Hubbell, W. L., Khorana, H. G. Structure and function in rhodopsin. Cysteines 65 and 316 are in proximity in a rhodopsin mutant as indicated by disulfide formation and interactions between attached spin labels. Biochemistry, 1996, 35, 1404014046. Farahbakhsh, Z. T., Hideg, K., Hubbell, W. L. Photoactivated conformational changes in rhodopsin. A time resolved spin-label study. Science, 1993, 262, 1416-1419. Scheer, A., Fanelli, F., Costa, T., De Benedetti, P. G., Cotecchia, S. Constitutively active mutants of the CtlB-adrenergic receptor: Role of highly conserved polar amino acids in receptor activation. EMBO J., 1996, 15, 35663578. Altenbach, C., Klein-Seetharaman, J., Hwa, J., Khorana, H. G., Hubbell, W. L. Structural features and light dependent changes in the sequence 59-75 connecting helices 1 and 2 in rhodopsin: A site-directed spin labelling study. Biochemistry, 1999, 38, 7945-7949. Samama, P., Cotecchia, S., Costa, T., Lefkowitz, R. J. A mutation-induced activated state of the J32-adrenergic receptor: Extending the ternary complex model. J. Biol. Chem., 1993, 268, 4625-4636. Kjelsberg, M. A., Cotecchia, S., Ostrowski, J., Caron, M. G., Lefkowitz, R. J., Constitutive activation of the CZlB-adrenergic receptor by all amino acid substitutions at a single site: Evidence for a region which constrains receptor activation. J. Biol. Chem., 1992, 267, 1430-1433. Arnis, S., Fahmy, K., Hofmann, K. P., Sakmar, T. P. A conserved carboxylic acid group mediates light dependent proton uptake and signalling by rhodopsin. J. Biol. Chem., 1994, 269, 23879-23881. Czaplewski, C., Pasenkiewicz-Glerula, M., Ciarkowski, J. G-protein coupled receptor-bioligand interactions modelled in a phospholipid bilayer. Int. J. Quant. Chem., 1999, 73, 61-70. Hulme, E. C., Lu, Z-L., Ward, S. D. C., Allman, K., Curtis, C. A. M. The conformational switch in 7-transmembrane receptors: The muscarinic receptor paradigm. Eu. J. Pharm., 1999, 375,247-260. Mahmoudian, M. The complex of human Gs protein with [33-adrenergic receptor: A computer-aided molecular modelling study. J. Mol. Graph., 1994, 12,22-34.
374
98.
99.
100.
101.
102.
103.
104.
105.
106.
107.
108.
109.
110.
111. 112.
113.
Sutcliffe, M. J., Haneef, I., Carney, D., Blundell, T. L. Knowledge-based modelling of homologous proteins, part I: Three-dimensional frameworks derived from the simultaneous superposition of multiple structures. Protein Engineering, 1987, 1,377-384. Sutcliffe, M. J., Hayes, F. R. F., Blundell, T. L. Knowledge-based modelling of homologous proteins, part II: Rules for the conformations of substituted sidechains. Protein Engineering, 1987, 1,385-392. Dratz, E. A., Furstenau, J. E., Lambert, C. G., Thireault, D. L., Rarick, H., Schepers, T., Pakhlevaniants, S., Harem, H. E. NMR structure of a receptor bound G-protein peptide. Nature, 1993, 363,276-281. Saverese, T. M., Fraser, C. M. In vitro mutagenesis and the search for structure function relationships among G-protein-coupled receptors. Biochem. J., 1992, 283, 1-19. Strader, C. D., Dixon, R. A. F., Cheung, A. H., Candelore, M. R., Blake, A. D., Sigal, I. S. Mutations that uncouple the beta adrenergic receptor from Gs and increase agonist affinity. J. Biol. Chem., 1987, 262,16439-16443. Masters, S. B., Miller, R. T., Chi, M. -H., Chang, F . - H . , Beiderman, B., Lopez, N. G., Bourne, H. R. Mutations in the GTP-binding site of Gs alter stimulation of adenylyl cyclase. J. Biol. Chem., 1989, 264, 15467-15474. O'Dowd, B. F., Hnatowich, M., Regan, J. W., Leader, W. M., Caron, M. G., Lefkowitz, R. J. Site directed mutagenesis of the cytoplasmic domains of the human 132-adrenergic receptor. ~ Biol. Chem., 1988, 263, 15985-15992. Lambright, D. G., Sondek, J., Bohm, A., Skiba, N. P., Hamm, H. E., Sigler, P. B. The 2.0 A Crystal Structure of a Heterotrimeric G Protein. Nature, 1996, 379,311-319. Sondek, J., Lambright, D. G., Noel, J. P., Hamm, H. E., Sigler, P. B. Gtpase Mechanism of G proteins from the 1.7-Angstrom Crystal Structure of Transducin Alpha-Gdp-Alf4. Nature, 1994, 372, 276-279. Lambright, D. G., Noel, J. P., Hamm, H. E., Sigler, P. B. Structural Determinants for Activation of the Alpha-Subunit of a Heterotrimeric G Protein. Nature, 1994, 369, 621-628. Sunahara, R. K., Tesmer, J. J. G., Gilman, A. G., Sprang, S. R.: Crystal Structure of the Adenylyl Cyclase Activator Gs-alpha. Science, 1997, 278, 1943-1947. Tesmer, J. J. G., Berman, D. M., Gilman, A. G., Sprang, S. R. Structure of RGS4 Bound to Alf4(-)-Activated G(I Alphal): Stabilization of the Transition State for GTP Hydrolysis. Cell, 1997, 89, 251-261. Oliveira, L., Paiva, A. C. M., Vriend, G. A low resolution model for the interaction of G proteins with G protein-coupled receptors. Protein Engineering, 1999, 12, 1087-1095. Palczewski, K. GTP-binding-protein-coupled receptor kinases-Two mechanistic models. Eur. ~ Biochem., 1997, 248,261-269. Ausiello, G., Cesareni, G., Helmer-Citterich, M. ESCHER: A new docking procedure applied to the reconstruction of protein tertiary structure. Proteins." Struct. Funct. Genet, 1997, 28, 556-567. Lichtarge, O., Bourne, H. R., Cohen, F. E. An evolutionary trace method defines binding surfaces common to protein families. J. Mol. Biol., 1996, 257, 342-358.
375
114.
115.
116.
117.
118. 119.
120.
121.
122.
123.
124.
125. 126. 127.
128.
Lichtarge, O., Bourne, H. R., Cohen, F. E. Evolutionary conserved Gcz137 binding surfaces support a model of the G protein-coupled receptor complex. Proc. Natl. Acad Sci., 1996, 93, 7507-7511. Lichtarge, O., Yamamoto, K. R., Cohen, F. E. Identification of functional surfaces of the zinc binding domains of intracellular receptors. ~ Mol. Biol., 1997, 274, 325-337. Hamm, H.E., Deretic, D., Arendt, A., Hargrave, P. A., Koenig, B., Hofmann, K. P. Site of G-protein binding to rhodopsin mapped with synthetic peptides from the alpha subunit. Science, 1988, 241,832-835. Rasenick, M. M., Watanabe, M., Lazarevic, M. B., Hatta, S., Hamm, H. E. Synthetic peptides as probes for G-protein function - Carboxyl terminal Gas peptides mimic Gs and evoke high-affinity agonist binding to 13-adrenergic receptors. J. Biol. Chem., 1994, 269, 21519-21525. Conklin, B. R., Bourne, H. R. Structural elements of Got subunits that interact with G137, receptors and effectors. Cell, 1993, 73, 631-641. Garcia, P. D., Onrust, R., Bell, S. M., Sakmar, T. P., Bourne, H. R. Transducin-alpha C-terminal mutations prevent activation by rhodopsin: Anew assay using recombinant proteins expressed in cultured cells. EMBO J., 1995, 14, 4460-4469. Hebert, T. E., Moffett, S., Morello, J. P., Loisel, T. P., Bichet, D. G., Barter, C., Bouvier, M. A peptide derived from a [32-adrenergic receptor transmembrane domain inhibits both receptor dimerisation and activation. J. Biol. Chem., 1996, 271, 16384-16392. Monnot, C., Bihoreau, C., Conchon, S., Curnow, K. M., Corvol, P., Clauser, E. Polar residues in the transmembrane domains of the type 1 angiotensin II receptor are required for binding and coupling: Reconstitution of the binding site by co-expression of two deficient mutants. J. Biol. Chem., 1996, 271, 1507-1513. Ng, G. Y..K., O'Dowd, B. F., Lee, S. P., Chung, H. T., Brann, M. R., Seeman, P., George, S.R. Dopamine D2 receptor dimers and receptor-blocking peptides. Biochem. Biophys. Res. Comm., 1996, 227,200-204. Maggio, R., Vogel, Z., Wess, J. Co-expression studies with mutant muscarinic adrenergic receptors provide evidence for intermolecular cross-talk between G-protein linked receptors. Proc. Natl. Acad. Sci., 1993, 90, 3103-3107. Maggio, R., Barbier, P., Fornai, F., Corsini, G. U. Functional role of the third cytoplasmic loop in muscarinic receptor dimerisation. J. Biol. Chem., 1996, 271, 31055-31060. Cvejic, S., Devi, L. A. Dimerisation of the delta opioid receptor: Implication for a role in receptor internalisation. J. Biol. Chem., 1997, 272, 26959-26964. Kolakowski, F. http://www.gcrdb.uthscsa.edu/FB_intro.html Bai, M., Trivedi, S., Brown, E. M. Dimerisation of the extracellular calciumsensing receptor (CAR) on the cell surface of CaR-transfected HEK293 cells. J. Biol. Chem., 1998, 273, 23605-23610. Jones, K. A., Borowsky, B., Tamm, J. A., Craig, D. A., Durkin, M. M., Dai, M., Yao, W. J., Johnson, M., Gunwaldsen, C., Huang, L. Y., Tang, C., Shen, Q. R., Salon, J. A., Morse, K., Laz, T., Smith, K.E., Nagarathnam, D., Noble, S. A., Branchek, T. A., Gerald, C. Heterodimerisation is required for the formation of a functional GABAB receptor. Nature, 1998, 396, 674-679.
376
129.
130.
131. 132. 133.
134.
135.
136. 137.
138. 139.
140.
141.
White, J. H., Wise, A., Main, M. J., Green, A., Fraser, N. J., Disney, G. H., Barnes, A. A., Emson, P., Foord, S. M., Marshall, F. H. Heterodimerization is required for the formation of a functional GABA~ receptor. Kaupmann, K., Malitschek, B., Schuler, V., Heid, J., Froest, W., Beck, P., Mosbacher, J., Bischoff, S., Kulik, A., Shigemoto, R., Karschin, A., Bettler, B. GABAB receptor subtypes assemble into functional heterotrimeric complexes. Nature, 1998, 396, 6.83-687. Romano, C., Yang, W. L., O'Malley, K. L. Metabotropic glutamate receptor 5 is a disulfide-linked dimer. J. Biol. Chem., 1996, 271,28612-28616. Reynolds, C. A., King, P. M., Richards, W. G. Free-energy calculations in molecular biophysics. Molecular Physics, 1992, 76, 251-275. Pazos, F., Helmer-Citterich, M., Ausiello. G., Valencia, A. Correlated mutations contain information about protein-protein interaction. J. Mol. Biol., 1997, 271, 511-523. Dean, M. K., Higgs, C., Smith, R. E., Snell, C. R., Bywater, R. P., Scott, P. D., Reynolds. C. A. Dimerisation: A general feature of G-protein coupled receptors? To be submitted. Davis, M. E., Madura, J. D., Sines, J., luty, B. A., Allison, S. A., McCammon, J. A. Diffusion controlled enzymatic reactions. Meth. Enzymol., 1991, 202, 473-497. Bergman, D. L., Laaksonen, L., Laaksonen, L. Gopenmol. J. Mol. Graph. Modelling, 1997, 15, 301. Mumby, S. M., Heukeroth, R. O., Gordon, J. I., Gilman, A. G. G-protein (xsubunit expression, myristoylation and membrane association in COS cells. Proc. Natl. Acad. Sci., 1990, 87, 728-732. Yang, Z., Wensel, T. G. N-Myristoylation of the rod outer segment G-protein, Transducin, in cultured retinas. J. Biol. Chem., 1992, 267, 23197-23201. Linder, M. E., Middleton, P., Hepler, J. R., Taussig, R., Gilman, A. G., Mumby, S. M. Lipid modifications of G-proteins: (x subunits are palmitoylated. Proc. Natl. Acad. Sci., 1993, 98, 3675-3679. Krieger-Brauer, H. I., Medda, P. K., Hebling, U., Kather, H. An antibody directed against residues 100-119 within the (x-helical domain of G~s defines a novel contact site for [3-adrenergic receptors. J. Biol. Chem., 1999, 274, 28308-28313. Onrust, R., Herzmark, P., Chi, P., Garcia, P. D., Lichtarge, O., Kingsley, C., Bourne, H. R. Receptor and [3• binding sites in the et subunit of the retinal Gprotein Transducin. Science, 1997, 275, 381-384.
L.A. Eriksson (Editor)
Theoretical Biochemistry- Processes and Properties of Biological Systems
377
Theoretical and Computational Chemistry, Vol. 9 9 2001 Elsevier Science B.V. All rights reserved
Chapter 10
Protein-DNA Interactions in the Initiation of Transcription: The Role of Flexibility and Dynamics of the TATA Recognition Sequence and the TATA Box Binding Protein. Nina Pastora and Harel Weinsteinb
aFacultad de Ciencias, U. Aut6noma del Estado de Morelos, Av. Universidad 1001, Col. Chamilpa, 62210 Cuernavaca, Morelos, M6xico bDept, of Physiology and Biophysics, Mount Sinai School of Medicine, One (3ustave L. Levy Place, New York, NY 10029, U.S.A. Initiation of transcription by RNA polymerase II on TATA box-containing promoters requires the obligatory assembly of a complex between the TATA box binding protein (TBP) and the TATA element in DNA. This complex forms a platform on which the remaining general transcription factors are assembled and bound. As the TATA elements are primarily composed of AoT basepairs, the sequence specificity dictated by the TATA box consensus sequence (TATA@A@X) is not easily explained by direct readout alone. Nor is it clear from structural data alone how protein-DNA interaction causes the striking structural properties observed in all the crystal structures of TBP complexes. These include the -90 ~ bend in DNA, and untwisting of the DNA helix throughout the eight basepairs contacted by TBP. We review here the current answers to these mechanistic questions, including the sources of DNA sequence selectivity in TBP-DNA complexes, in light of sequence-dependent properties of the various TATA boxes. Results from specific experiments, as well as from molecular dynamics simulations and energy calculations, are used to arrive at the proposed mechanisms that rationalize both structural and kinetic data. A surprising role of structural flexibility of TBP in these mechanisms is revealed from molecular dynamics simulations and recent experimental data. 1. TBP AND TRANSCRIPTION
TBP is a general transcription factor present in archaea [1] and eukarya [2], and is required for transcription mediated by the three nuclear RNA polymerases (RNApol) [3]. TBP exists in the cell as a complex with
378 TBP-associated factors (TAFs) [4] which form at least four distinct groups (SL1, TFIID, TFIIIB, and SNAPc), corresponding to a particular polymerase [5]. The core promoter of genes transcribed by RNApol-II may have a TATA box, an initiator, both or neither [6]. For those which contain a TATA box (consensus sequence TATA@ A@X, where @ stands for an AoT basepair (bp) regardless of its orientation [7], see www.epd.isb-sib.ch/promoter_elements/), this motif is normally situated 30 bp upstream of the transcription initiation site. The efficiency of transcription elicited by this type of core promoter is related to the affinity of TBP for the TATA box [8-10]. Promoters without a recognizable TATA box recruit TBP by protein-protein interactions, as some of the TAFs have sequence specific affinity for the initiator or for a downstream promoter element [ 11]; this mechanism is also used in RNApol-I and RNApol-III transcription, because the genes transcribed by these two polymerases do not have TATA boxes in general. Our presentation here focuses on RNApol-II transcription of TATA box-containing promoters. -Transcription initiation requires the assembly of a preinitiation complex (PIC) that involves the binding of TBP to the DNA (either alone or as part of TFIID), followed by the binding of the remaining general transcription factors (TFIIA, TFIIB, TFIIE, and TFIIH) and the polymerase. This can happen in a stepwise fashion [ 12] or by recruiting a large complex which includes the RNA polymerase, TFIIE, TFIIH, and other proteins (SRBs) [9,10,13,14]. Assembly of the PIC is strictly regulated, both by activators and repressors, principally through interactions with TBP, TAFs, TFIIB and TFIIA [6,15]. Recently, a family of related proteins has been identified in metazoa, the TBPlike proteins or TLPs [16], which are also able to bind to DNA, albeit not to the consensus TATA boxes. The physiological role of TLPs is an area of intense research, and some proposals suggest that it sequesters other general transcription factors (such as TFIIA), thereby repressing RNApol-II transcription mediated by TBP [ 17].
1.1. Structural biology of TBP TBP sequences are available from the NCBI server (www.ncbi.nlm.nih.gov/) for organisms representing archaea and the four kingdoms of eukarya. In all cases, the amino acid sequences contain a conserved C-terminal domain o f - 180 amino acids, and a variable N-terminus, ranging from 1 to 172 residues. The C-terminal domain appears to have arisen by duplication of an ancestor protein before the separation of archaea and eukarya, dividing this domain in two subdomains with -40% identity in sequence [ 1]. Crystal structures have been obtained for the C-terminal domain of free TBP from three different species (P.woesei [18], A.thaliana [19,20], and S.cerevisiae [21]), revealing a saddle-like structure with stirrups formed by a 10-stranded ~-sheet and four ix-helices, reflecting the imperfect repeats found in the sequence. All these structures crystallized as dimers, and the dimerization
379 interface corresponds to the underside of the saddle, which is also the DNA binding surface. TBP binds to DNA as a monomer, a finding congruent with the reported TBP-DNA binary complex structures determined by X-ray crystallography [22-26]. The complexes reported to date show that TBP binds in the minor groove of DNA to eight bp. While TBP appears to undergo small structural adjustments upon complex formation, the DNA is severely kinked (two -45 ~ kinks at the first and last basepair step (bps) of the TATA box) and unwound by ~100 ~ with the consequent widening of the minor groove and the compression of the major groove. Despite this deformation, all bp retain the Watson-Crick hydrogen bonds. Ternary complexes formed by TBP, DNA and TFIIA or TFIIB have also been crystallized [27-32] and their structures are available in the PDB and NDB (see Table 1). These complexes display practically the same mode of interaction between TBP and the DNA moiety as found in the corresponding binary complexes. They also explain the inability of TFIIB to bind to DNA on its own, as it is found to contact the DNA upstream and downstream of TBP, an impossible feat in a straight DNA molecule. This feature places TBP among the architectural transcription factors, together with UBF, HMG, SRY and LEF1 [33]. TFIIA binds upstream of TBP, and extends the molecular surface generated by the 10-stranded ~-sheet by apposing one of its subunits to the C-terminal subdomain of TBP. The largest TAF included in TFIID has an inhibitory action on TBP binding to DNA. The structure of a domain located in the N-terminus of D. melanogaster TAF230 complexed with S.cerevisiae TBP was obtained by NMR [34]. The mechanism responsible for inhibiting DNA binding can be understood from this structure: the domain from TAF230 adopts a structure reminiscent of the structure of bound DNA, in an excellent example of molecular mimicry. A key example of this mimicry is the strategic placement of charged amino acid residues to simulate phosphates. Table 1 contains a summary of all these structures, including accession numbers in the PDB [35] and NDB [36], and the primary references. The structures of larger aggregates leading to the complete PIC have not been amenable to X-ray crystallography as yet. Nonetheless, there are crosslinking studies which have helped greatly in the understanding of the spatial disposition of general transcription factors and TAFs along a core promoter [37-41]. Crystal structures have also been reported for stable tryptic fragments of two of the smaller TAFs, showing a surprising structural homology to histones [42], and NMR structures were obtained separately for the N-terminal [43] and C-terminal [44] domains of TFIIB. Interestingly, the C-terminal domain of TFIIB is similar in architecture to cyclins [45].
380
Table 1 TBP Structures in the PDB and the NDB PDB NDB TBP
molecules
resolution (/~)
reference
NB P0231 NBP0229 NBP0226
ATH dimer SCE dimer PWO dimer
2.1 2.6 2.2
[20] [21 ] [ 18]
PDT009/025
ATH / MLP
1.9
[24]
PDT012 PDT034 PDT024
SCE / CYC1 HSA / MLP HSA / E4
1.8 1.9 2.9
[23] [25] [26]
1VOL 1YTF
PDT032 PDT036
ATH /TFIIB / MLP SCE /TFHA / CYC1
2.7 2.5
[27] [28]
1AIS
PDR031
PWO / TFB / EF1 tx
2.1
[30]
1D3U
PD0070
PWO / TFB / T6 + BRE
2.4
[31 ]
1VOK 1TBP 1PCZ TBP-DNA 1YTB 1CDW 1TGH TERNARY
TBP-TAF 1TBA
SCE/dTAFu230 (11-77)
NMR [34] ATH: A.thaliana; SCE: S.cerevisiae; PWO: P.woesei; HSA: H.sapiens; TFm3 C-terminal domain of transcription factor liB; TFIIA: fragments of transcription factor HA; TFB" P.woesei homolog of TFIIB; MLP: d(CTATAAAAGGGC); CYCI" d(GTATATAAAACG); E4: d(CGTATATATACG); EFltx: d(ACTITI'I1AAAGC); T6 + BRE: d(AGAGTAAAGTVrAAATACTI'ATAT).
1.2. Kinetics and thermodynamics of TATA box recognition and binding TBP binds slowly to D N A (k a = 105 M -1 s-l) [46], and dissociates even more slowly (for very stable complexes, the lifetime of the complex can exceed two hours) [47]. The need for this remarkable stability of the T B P - D N A complex has been rationalized in terms of its being a launching pad for multiple rounds of transcription [2]. The fact that TBP binds more slowly than would be allowed by diffusion has lead to much discussion in the field [48]. The mechanisms that have been advanced to explain this are as follows: a) the limiting step is breaking up the TBP dimers that exist in solution before
381 DNA binding [49-54]; b) binding occurs in two steps, with the formation of a non-specific complex and subsequent sliding along the DNA [55]; c) binding occurs in two steps, and the limiting step is the isomerization of DNA from a straight to a bent form [46]; d) binding is slow because TBP binds to a tiny population of DNA molecules which are ready for binding (e.g. pre-bent), assuming that bending and binding occur simultaneously [56-60]. There is a large body of experimental evidence which has been interpreted to fit one or more of these models, but there is still no consensus, in part owing to the different experimental conditions. The most recent data suggest that bending and binding occur at the same time, and that there appear to be two intermediates [60]. Also, correctly pre-bent [61] or more flexible DNA is bound better by TBP, and dissociates more slowly [8,62]. A structural model for these intermediates would serve well in the discrimination of alternatives. -TBP binds to canonical TATA boxes with nanomolar affinities [56,59]. It exhibits a modest salt dependence on the free energy of binding that is not DNA sequence dependent [59]. The energy corresponds to the liberation of--3 cations from the surface of DNA upon complex formation, according to the model proposed by Record [63] based on Counterion Condensation Theory [64]. This number should be considered with caution, however, as TBP binding is sensitive to the type of anion present in the buffer (M. Brenowitz, personal communication). Most interesting is the variation of affinity with temperature" a van't Hoff analysis revealed a large decrease in heat capacity upon complex formation, and the magnitude of the change is sequence dependent (from-3.5 kcal/moloK for a very good promoter -MLP, see Table 1- to almost zero for poor promoter sequences (M. Brenowitz, personal communication and [56,59]). The standard model used to interpret the change in heat capacity states that it is related to the dehydration of the complex interface [65], and that it represents the increase in mobility of the previously surface-bound water molecules. In this case, however, the solvent-accessible surface area shielded from the solvent upon TBP-MLP complexation can only account for -20% of the decrease in heat capacity, an indication that other mechanisms are responsible for the bulk of the change in heat capacity. Sturtevant [66] suggested that changes in hydrogen bond strength and in the population of vibrational states can also contribute to this quantity. As the structure of different TBP-DNA complexes is very similar, connoting a very similar decrease in solvent accessible area, the two dynamic contributions might be responsible for the DNA sequence dependence of the heat capacity change. Because the formation of a stable TBP-DNA complex is crucial for the expression of a wide variety of genes, it is imperative that the mechanism for its formation be understood. This is particularly timely now, as the rapid sequencing of genomes and the perspective of genetic therapy call for a deeper
382 understanding of the factors that govern transcription. To achieve such an understanding, the wealth of experimental data can be productively complemented by molecular modeling and simulations where individual contributions can be tested in a more controlled fashion. Section 2 below characterizes such computationally based efforts to understand how TBP achieves sequence specific recognition.
2. TATA BOX SEQUENCE SPECIFIC RECOGNITION TBP-DNA complexes offer a unique opportunity to dissect the various elements that participate in sequence specific recognition in the minor groove of DNA. The molecular surface of the minor groove is regarded as poor in information content, and until very recently it was considered to be degenerate for AoT rich DNA,_because the positions of the hydrogen bond acceptors (thymine 02 and adenine N3) and the adenine C2 are almost equivalent for AoT and T.A bp [67]. Dervan and coworkers designed a polyamide which is capable of discriminating between these two bp, by combining pyrroles and hydroxypyrroles side by side [68]. In the following sections we explore possible molecular mechanisms whereby TBP selects TATA boxes from other A.T rich DNA sequences.
2.1. The role of direct readout TBP forms a large interface with DNA of roughly 2200 A2. This contact surface can be thought of as comprising five tiers of interactions, going from the phosphate groups and the sugar moieties to the bases and the C2 position of adenine which lines the center of the minor groove. Detailed maps of these interactions can be found in the original articles reporting the crystal structures of the complexes (see Table 1) and will not be repeated here. The DNA sequences which have been crystallized in complex with TBP up to now are given below with the numbering of the bp and bps used throughout: MLP CYC1 CYC1B E4 EFI~ T6
T T T T T T
A A A A T T
T T T T T T
A A A A T A
A T A T T A
bp number bps number
12345678 1234567
A A A A A A
A A A T A T
G A C A A A
383 An alignment of the amino acid sequences of the proteins crystallized in complex with TATA boxes is given in Table 2, together with a compilation of mutated loci which result in the impairment of binding to DNA. The remarkable features of this interface are that it is anhydrous, mostly hydrophobic, and with very few hydrogen bonds between TBP side chains and the base edges. A group of valine, leucine and proline residues makes contacts to the C2 position in adenine, thereby selecting against guanine-containing DNA sequences. In this sense, TBP uses the same mechanism as lexitropsins to read the minor groove, as was pointed out by Juo et aL [26]. Hydrogen bonds to the rims of the bases are found only in the dyad axis of the TATA box (in bp 4 and 5 in mesophile complexes, and in bp 3, 4 and 5 in the hyperthermophile ternary complex), where asparagine (9 and 99), threonine (64 and 155) and serine (155 in PWO) residues function as donors to the N3 and 0 2 positions of adenine and thymine, respectively. For AA steps (as in MLP and CYC1B), only five hydrogen bonds are found, while for AT steps (as in CYC1 and E4), six hydrogen bonds can be seen in the crystal structures. The only difference is the rotamer populated by one of the threonine residues" it will either interact with the base rim or with a nearby sugar oxygen. It is interesting to note that the bps at the dyad of the TATA box is the most unwound in the whole recognition site, and it is also a place where sequence specific interactions could be playing a role in determining the selectivity of binding. Sugar rings are contacted by polar residues such as glutamine, threonine and serine, and also by hydrophobic side chains belonging to valine and isoleucine residues. These interactions appear to stabilize the minor groove in an open conformation. Not all the phosphates are engaged in salt bridges or hydrogen bonds to TBP. Indeed, the first two reported complex structures had arginine and lysine residues in the vicinity of DNA, but interacting with other residues in TBP, not the phosphates. These basic residues stabilize the complex by electrostatic neutralization of the negative charge in DNA, by direct interactions an~or by generating a positive electrostatic potential [24,81]. The large deformation of the DNA helix involves the insertion of two pairs of phenylalanine rings (F39, F56, F130 and F147), into bps 1 and 7 in the TATA element. This interaction results in --45 ~ kinks, coupled to untwisting and vertical separation of the bp. In summary, the deformation is large at two bps, and there are few special interactions between TBP and DNA: the first TA step of the TATA box (where F130 and F147 are inserted), and the central bps where N9, T64, N99 and T155 make hydrogen bonds to the rims of the bases. These interactions are depicted in Figure 1. To evaluate the energy involved in these interactions, quantum chemical calculations were carried out for molecular models, looking for energetic differences in the interactions that could explain sequence selectivity [82,83].
384
Table 2 Sequence alignment of the C-terminal domain of TBP mutation 100% DNA
ATH2 SCE HSA PWO
( 19 ) ( 61) (155 ) ( 5)
mutation 100% DNA ATH2 SCE H_SA PWO
8 8 $
51 51 51 51
1 1 1 1
88
I
TTALIFASGK TTALIFASGK TTALIFSSGK VALLIFSSGK
I---I
ATH2 SCE HSA PWO
i01 I01 i01 I01
$8
888
*
.
mutation 100% DNA A T H 2 151 SCE 151 HSA 151 PWO 151
MVCTGAKSED MVVTGAKSED MVCTGAKSEE LVVTGAKSVQ
I...... I
VGSCDVKFPI VGSCDVKFPI VGSCDVKFPI VFSGDIGREF .
.
Sl'
8
.
.
.
8
K
*
.
8
I
FSKMAARKYA DSKLASRKYA QSRLAARKYA DIERAVAKLA H2
88
8
***
I-I $2
8 8
808
8
RIVQKLGF-PA RIIQKIGF-AA RVVQKLGF-PA QKLKSIGVKFK
I
* *
8 i*
AVIMRIREPK AVIMRIREPK AVIMRIREPR GIICHLDDPK
I---I
85
#
$2'
$3'
N * **
KFKDFKIQNI KFTDFKIQNI KFLDFKIQNM RAPQIDVQNM
I
.
PGLIYRMKVP PGLIYRMVKP PGLIYRMIKP PGVIYRVKEP
I---I
$ 8 88
$3
AFSSYEPELF TFSSYEPELF QFSSYEPELF N-CEYEPEQF
HI'
8
8
*
$ $ 88 8 @# 8 Y P F G ** * * *
I..... II-I
$ 8
$
NAEYNPKRFA NAEYNPKRFA NAEYNPKRFA NSKYNPEEFP
I..... I
8 088
RLEGLAYSHA RLEGLAFSHG RLEGLVLTHQ NLDVVALTLP
I
$ @$
HI
$ 8 $8 * **
8
DLKAIALQAR DLKTVALHAR DLKTIALRAR DLEKVLDLCP
I
S5
8
8 @8
VSTVNLDCKL VATVTLGCRL VSTVNLGCKL VASVDLFAQL
S1
$4
mutation 100% DNA
888
* *
SGIVPTLQNI SGIVPTLQNI SGIVPQLQNI SKVKLRIENI
# 8 808 G * * * *
*
8
N **
.
.
.
.
.
.
8 # 8 # $ 88 L F G ! * * * * * KIVLLIFVSG KIVLLIFVSG RIVLLIFVSG KSVILLFSSG
I---I
S4'
$8
G
KIVITGAKMR KIVLTGAKQR KVVLTGAKVR KIVCSGAKSE
I..... I $5'
I
DETYKAFENI EEIYQAFEAI AEIYEAFENI ADAWEAVRKL H2'
YPVLSEFRKI YPVLSEFRKM YPILKGFRKT LRELDKYGLL
I
The specific TBP sequences are identified as in Table 1, with the most N-terminal residue present in the alignment indicated in parenthesis. Residue numbering in the alignment (1 to 180) is defined by this table. (-) indicate insertions/deletions. The secondary structure assignment is indicated below the last sequence in the alignment; 100% conserved residues in all available TBP sequences (> 50)oare shown above the corresponding position. DNA: * indicates residues in contact (within 5A) with DNA; ! = cis proline. Mutation: # indicates a locus causing specificity relaxation [69-71 ] ; @ indicates a mutant with impaired DNA binding [3,69,72-80]; $ indicates a DNA binding mutant whose DNA binding can not rescued by TFIIA and TFIIB [75].
385
Figure 1: Interactions at the TBP-DNA interface, and the drastic structural deformation of the DNA. The upper panel shows the C a trace of the C-terminal domain of A. thaliana TBP (white) and the MLP TATA box (grey) from the PDT025 complex (see Table 1). In the lower panel, the complex is rotated by 90 ~ relative to the figure above it, to illustrate the --90 ~ bend imposed on the DNA.
386
Figure 2: Direct readout by P131 (white) and L145 (dark gray) in the first TA step of the PDT025 complex (see Table 1). The DNA bps is depicted in light gray. P131 and L145 contact the C2 position of the adenines in the first TA step.
Figure 3: H-bonds to thymine at the TBP-DNA interface: Central bps of PDT025 (left) and 1YTB (fight).
387 The energy calculations showed that Phenylalanine residues involved in the DNA kinking cannot discriminate AoT from T.A, or from C.G, for that matter [83]. The calculations further showed that P131 and L145 of TBP select against guanine at this same bps because they clash with the exocyclic amino group of the base, but these interactions are also unable to distinguish AoT from ToA (see Figure 2). Hence, the stringent requirement for a TA step at position 1 in the TATA box cannot be explained from direct readout. Calculations of hydrogen bonding energies at the central bps [82], indicated equal hydrogen bond strengths for AoT and T-A, again supporting the idea of stereochemical equivalence between these two bp. A closeup of these interactions is shown in Figure 3. Together, all these results underscore the inadequacy of direct readout as a mechanism for achieving sequence specificity in TATA box binding by TBP. 2.2. The energy cost of DNA bending: an alternative sequencedependence mechanism. The great deformation found in TBP-bound DNA has lead to the suggestion that the energy cost of bending, which is sequence dependent, might be a source of selectivity [5,22]. This contention is supported by experiments showing improved binding to i) TATA boxes in correctly phased curved DNA [61], and ii) TATA boxes with increased flexibility [8,62]. These two experimental setups illustrate two extreme views of how DNA can participate in complex formation: on the one hand, one assumes stable static bends in DNA which are already in the direction that TBP will later exaggerate in forming the complex; on the other hand, the picture would be of a flexible rod which is straight on average but makes large and frequent excursions into bent conformations, that are selected out by TBP when it binds to DNA.
2.2.1 Stable bends Following the idea of a pre-bent DNA, an intermediate conformation was proposed [22], that is similar in structure to A-DNA. The backbone of the DNA in complex with TBP (termed TA-DNA) is very close in conformation to that of A-DNA, and only the glycosyl bond torsion is in the range found in B-DNA [84]. Based on this, a B-DNA ~ A-DNA ~ TA-DNA series of transitions was proposed in the mechanism of complex formation [22,84]. Our molecular dynamics simulations showed that a smooth transition from A-DNA to the TA-DNA conformation is indeed made possible by changing the torsion angle of the glycosyl bond [85]. Lebrun et al. [86] proposed a smooth B-DNA --~ TA-DNA transition by pushing apart specific phosphates in the TATA box. This mechanism has been applied as well to other DNA sequences that bind proteins in their minor groove, and yielded similar results [87]. The driving force for the transition has not been determined, but some proposals have been advanced. We have suggested recently [83] that the transition is brought about
388 when the phenylalanine residues insert in the kink positions, and that this effect is propagated into the rest of the structure. Elcock and McCammon [88] suggested that an increase in phosphate repulsion is caused by their immersion in a low dielectric medium produced by the approaching TBP, to produce the distortion of the DNA. The final conformation is stabilized by some of the arginines and lysines that line the phosphate-sugar backbone in the complex
[86]. One way of assessing the likelihood of stable bends in the TATA structure is from the analysis of the dynamic behavior of various sequences in MD simulations. The collection of sequences we simulated is shown in Table 3. The different sequences are identified in boldface, to distinguish them from the DNA sequences crystallized in complexes with TBP and listed in Table 1. The dynamic behavior of the TBP-binding sequence in M L P has been explored with MD simulations by us [89] and two other groups [90-92], with two different forcefields (CHARMM23 [93] and AMBER 4.0 [94]) and two different schemes for managing the evaluation of electrostatic energy. In all cases, the starting B-DNA conformation underwent transition to an A-like structure. Because CHARMM23 has a documented preference for the A-DNA conformation, while AMBER stresses the B-DNA character of the structure [95,96], it is reassuring to find that all the MLP simulations reported thus far agree on an A-like structure. Notably, a series of DNA minicircle ligation efficiency experiments for small DNA fragments containing this same sequence were interpreted assuming that MLP is bent, but in the opposite direction to the one required by TBP [97]. This interpretation is based on the junction and the wedge models for adenine tracts [98], so it is still open for discussion (see below). In order to test the hypothesis that efficient promoter sequences will be more likely to acquire an A-DNA like conformation than other sequences, we carried out a collection of molecular dynamics simulations of the DNA double stranded dodecamers listed in Table 3. All these simulations were done with the CHARMM23 potential [93], in the presence of explicit solvent (--3500 TIP3 [99] water molecules) and 22 sodium ions (the simulation protocol is detailed in [89,100,101]). The DNA sequences were chosen to include known functional promoters ( M L P , M L P 2 , A T , E 4 , 6 T , C Y C 1 , E F 1 A , and R 2 8 ) , nonfunctional promoters which could function with a mutant TBP (2C and 7G) [69,70], an inosine variant which can promote transcription (I) [102], and negative controls (GC, POLYA). This collection of sequences also includes two pairs of TATA boxes located in different contexts ( M L P and M L P 2 , and AT and E4) in order to explore the sensitivity of the results to end effects. All the simulations started from a canonical B-DNA conformation and relaxed into a structure closer to A-DNA; after 2 ns of simulation, not all the sequences achieved the same structure, as shown in Table 3, an indication that the simulation protocol is capable of identifying sequence dependent features.
389 Moreover, the behavior of TATA sequences duplicated in different simulations appears to remain consistent. The similarity to A-DNA is due primarily to the conformation of the sugar rings; most of the purines in the purine tracts present in these sequences become locked in the North range of the pseudorotation cycle, and the analysis of the time evolution of this transition would suggest that the motion is nucleated at the 3' end of the purine tract and moves upwards along the sequence. The pyrimidine strand switches later into the North range, maybe as a response to the complementary strand. As a general rule, pyrimidines tend to switch back and forth between the South .and North ranges during the simulations, as seen in the time traces in Figure 4. Table 3 DNA oligomers simulated by molecular dynamics name sequence simulation length (ns)
RMSD from A-DNA (!i)
RMSDfrom B-DNA (A)
MLP
CTATAAAAGGGC
2.04
1.5
4.0
MLP2
GCTATAAAAGGC
2.04
1.1
4.2
AT
ATATATATATAT
2.04
2.7
3.4
EF 1A
ACTTTTTAAAGC
2.04
I.6
6T
CYCI
R28
CTATATAAGGGC
GTATATAAAACG CTTTTATAGGGC
E4
CGTATATATAC G
2C
C CATAAAAGGGC
I
CTITIIIIGGGC
2.04 2.04
2.04
1.5
4.1
1.3
4.0
1.2
4,5
1.9
3.7
3.8
I.02
2.4
2.04
1.0
4.2
2.04
3.0
7G
CTATAAGAGGGC
2.04
1.0
4.1
G C
GCGCGCGCGCGC
2.04
3.4
2.0
POLYA ~ 2.04 1.7 3.7 Values of the root mean square deviation (RMSD) averaged over the last 100 ps of the production phase for the TATA box only. Two simulations stand out (AT and GC) in Table 3, because the resulting structures are the farthest from A-DNA; the same behavior is apparent for E4 at a shorter simulation time. This is actually expected, as alternating purinepyrimidine sequences are supposed to adopt wrinkled-DNA conformations [103]. The time evolution of the RMSD for AT and GC is shown in Figure 5. These plots illustrate two important points: one is that there are long-lived oscillations in the structure (see also [95]). Had the GC simulation stopped after 1 ns, the conclusion would have been that all the sequences are equally likely to be in an A-DNA conformation. The other point is that the distance from A- and
390 B-DNA need not be symmetric, as seen for AT, which means that this particular simulation is not going in a straight path from B-DNA to A-DNA. It should also be noted that most of the structures of DNA oligomers determined by NMR tend to have a conformation that is neither A- nor B-DNA, but exhibit characteristics of both kinds of helices [ 104,105]. Based on the data presented in Table 3, one can attempt to rank the sequences by their structural closeness to A-DNA, and see if it correlates with their ability to promote transcription and/or bind to TBP. The best promoter sequences would correspond to 2C and 7G, followed by MLP2, R28, C Y C 1 , M L P , 6T, E F 1 A , P O L Y A , I, AT and G C . The 2C and 7G sequences, which top the best promoter ranking, are not functional promoters for wild type TBP. However, this is most likely due to a steric problem, that is, the presence of the exocyclic amino group in guanine. This explanation is consistent with the fact that mutations in TBP can apparently overcome the problem and elicit transcription from these sequences [69,70]. Note, however, that the ranking also places a very efficient promoter (AT) [106,107] amongst the poor promoter sequences, suggesting that an average A-DNA character cannot be the sole feature that makes a DNA sequence suitable for TBP binding. 2.2.2. Flexibility The analysis of the flexibility of these oligomers was done at the level of their constituent bps. This allows for the calculation of all local geometrical parameters of these bps, - i.e., shift, slide, rise, tilt, roll and twist - and makes it possible to compare the values obtained from the simulation to those represented in the NDB for DNA oligomers of different lengths [89,100,101]. For this analysis, the geometrical parameters calculated [ 108] for the bps in all the trajectories from the simulations, were pooled together and then separated according to i)the sequence type, i.e., purine-purine (RR), purine-pyrimidine (RY) and pyrimidine-purine (YR) and ii) the specific nucleotides involved. The resulting data set reflects both a time average and an ensemble average over the set of simulations, thus improving the sampling statistics for the configuration space of the dodecamers. Figure 6 below, shows the normalized distributions for the values obtained from the simulations for AMTT, AT and TA steps, for the six bps parameters. In all cases, the distributions belonging to TA steps are wider than those of the other steps, and we interpret that as a reflection of the greater flexibility of this step. Except for tilt, all these parameters are important for understanding sequence specific binding to TATA boxes [89,100,101,109], because TBP forces the conformation of DNA into regions of conformational space that are not well sampled by free DNA in the simulations or in the available free DNA structures contained in the NDB. It is also obvious from these plots that slide, rise and roll have a marked sequence dependence [110].
391
A6
T7
150
150
120
120
90
90
60
60 0
500
1000
time (ps)
1500
2000
0
500
A7
2000
150
120
120 -
--
r
90
90
6O
60 0
500
1000
time
1500
2000
0
500
(ps)
A8
1000
1500
2000
1500
2000
1000 1500 time (ps)
2000
time (ps)
T5
1 ol .... , ................
1
120
150
r
90
120 90
60
60 0
500
1000
time (ps)
1500
2000
0
500
G9
1000
time (ps) C4
150 ~
t,~
1500
T6
150
r
1000
time (ps)
150
120
r
90
120 90
60
60
0
500
1000 1500 time (ps)
2000
0
500
Figure 4. Time evolution of the sugar dihedral angle 8 for the AAAG bp in the MLP simulation.
392
/x 5 4
|
o,~
q
,,.
V
09
t
3
r
0
2
~ O J l
0
500
1000 1500 time (ps)
2000
B 5 4
o<~ V
q
3
qb
m
_F
2
~
~
~
|
|
I
t~
ill
m
0
500
1000
time (ps)
1500
2000
Figure 5. Time evolution of the RMS difference for the simulations GC (panel A) and AT (panel B), for bp 2-9 of the dodecamers listed in Table 3. Solid lines correspond to the deviation from B-DNA, and the broken line, from A-DNA.
393 The average conformation of these steps corresponds to the position of the peaks in the distributions. Five of the seven bps in the complex have positive slide, and Figure 6B shows that none of the steps populates significantly the positive range of values for slide. Furthermore, these data suggest that AT steps will be the most appropriate of the three for achieving positive values of the slide parameter. By the same line of reasoning, TA steps are the best suited for the kink positions in the Complex because of their tendency high values of rise, and AT steps are well suited to form part of the complex because of their positive roll. AA steps are the most rigid, especially regarding twist, so they should not be present in large numbers in functional TATA boxes. The trends seen for AA, AT and TA steps are also shared for the whole group of sequences included in the RR, RY and YR populations, respectively. With this in mind, one would predict that A T , GC and E4 are good promoter sequences, because they are formed by an alternation of RY steps that produce the appropriate roll profile, and YR steps, which are the most flexible and thereby easier to unstack and bend. Note that the roll profiles are asymmetric, and they show a greater tendency to bend towards the major groove than towards the minor groove. This would be reflected in the anisotropic bendability which was invoked by Kahn and coworkers to explain cyclization efficiency of DNA minicircles [97]. Where we disagree is in the direction of the anisotropy. They propose that MLP is bent towards the minor groove, and that this equilibrium conformation acts as an inhibitor of the formation of unproductive complexes. The present results identifying flexibility properties of the functional TATA sequences are also relevant for a long standing problem in undei:standing TBP-TATA recognition: the directionality of binding. The direction of TBP binding is very important because it determines which strand of DNA will be copied into mRNA. Kinetic studies have suggested that there is a very weak preference for the two different directions in which TBP can bind to a TATA box (66% of the time it will bind in the direction found in the crystal structures) [111], and have proposed a very slow isomerization of the complexes formed in the "wrong" direction [48,111]. Incubating DNA with TBP, TAFs and other general transcription factors improves the selection of a direction in binding, shifting the responsibility of choosing the direction to the TAFs and the general transcription factors [31,32]. The difficulty in understanding directionality is due to the high symmetry of the TBP residues involved in interactions with DNA. Kim and Burley [22] proposed that the MLP sequence is composed of a flexible region (TATA) and a rigid section (AAAG), and that the DNA binding domain of TBP is likewise composed of a more flexible N-terminal subdomain and a more rigid C-terminus. Suzuki [109] extended this idea by considering the packing efficiency of the two subdomains. Thus, the directionality of binding is proposed to be governed by matching a flexible protein subdomain with a rigid section of DNA, and vice versa.
394
A
0.15
,
9 ,
i
,
,
9 i
9
9 ,
I
,
9 ,
i
9 ,
,
i
,
,
tO
0.1
0
,
-6
,
,
I
,
-4
,
9
O O.
~,. 9
..,==
-2
0
==L...,
02
I
,
4
shift (A)
i
l
9
9
9
9
9
C
9
I
~.0.05
,
9
9
9
.
i
|
9
.
|
9
i
9
.
.
.
9
i
.
I
|
|
9
9 |
,
9 i
,,
I
,
3
D
0 O_
tilt (~
(A) E
I
i
.o.c
9
3.5
9 |
-3 slideO(~,)
a
C
rise
|
,
.9 o
;
, . \ ,"" \,j.'..-i
O O.
9
0.1~
9
;
a a
.o 0.1
9
0
6
C
0.15
C
|
~0.05 II
.o
9
0.1
==i=
O.. 0.05
0.15
B
0.15
,
i
.
|
.
.
.
|
i
0.1
|
0
|
=
C O m
I
.==~=
o.0.05
O O.
O
..,t,' , -40
g
O
....
roll~176
40
25
twist
(o)
50
Figure 6. Normalized distributions for the values of the bps geometrical parameters for AA (solid line), AT (broken line) and TA steps (dotted line), from all the relevant simulations in Table 3.
395 The populations shown in Figure 6 agree with the way MLP was parsed, and further support for this idea comes from experiments where DNA is made more flexible by including mismatches and bulges in the sequence [62]. For these sequences, TBP binds preferentially in the direction predicted by Kim and Budey and Suzuki. 2.2.3 Free energy calculations While the structural and dynamic considerations described above indicate the basis for the mechanism of selectivity, a direct way to determine which DNA sequences are the least cosily to bend is to calculate the free energy cost of the conformational transition. Unfortunately, this is not straightforward, due to the large change in conformation. Instead of calculating the cost of bending the entire TATA element, we have studied segments of the TATA box that are more amenable to calculations of this sort. As described above, there are two steps in the TATA box which suffer very large changes in structure, mostly in rise, roll and twist. The requirement for a TA step at the beginning of the TATA box cannot be explained by direct readout, and there is an indication that this step is particularly flexible. We calculated the free energy necessary to drive a series of DNA double stranded tetramers (GTAT, GATT, GAAT and GTTT) from B-DNA and A-DNA into the structure found in 1YTB. The details of the simulations can be found in [83]. These calculations were carded out within the Potential of Mean Force (PMF) approximation, using the AMBER 4.0 potential [94]. The transition was driven by modifying the dihedral angles of the central bps in these tetramers in a linear fashion over 51 windows, amounting to a total simulation time of 1.5 ns for each sequence (except for GTTT, which required 81 windows and 2 ns to achieve the transition - see [83]). The energy-based ranking produced by these calculations for the B- to TA-DNA transition is: GTAT > GATT > GAAT = GTTT, which agrees well with the predictions made on the basis of the histograms in Figure 6. GAAT and GTTT are least adaptable to this conformation because of a steric clash between the thymine methyl groups in the major groove (see also [112]). Interestingly, these sequences are found at the kink positions in 1AIS, the ternary hyperthermophile complex. In this complex, the geometry of the kink is slightly different, with a higher rise and smaller s l i d e , alleviating the methyl-methyl clash. This structural rearrangement is probably not free of energy cost, as this complex has a very different salt and temperature behavior compared to the mesophile complexes [ 113]. The results from the calculations identify GTAT and GATT as preferred energetically, because the inter-basepair hydrogen bonds they form in the major groove stabilize the TA-DNA conformation. These hydrogen bonds have a better geometry in GTAT than in GATT, accounting for the further energetic preference for GTAT [83]. The same kind of calculation was performed for DNA double stranded
396 tetramers TATA and TAAA, corresponding to the sequences at the dyad of structures 1YTB and PDT025, respectively. This position is characterized by positive roll and the most severe unwinding in the whole element. The calculated transition was from A-DNA to the conformation in 1YTB and PDT025, using the same protocol as before [83]. The calculations started from A-DNA because of the tendency of these sequences to adopt A-DNA like geometries in solution (see above). The conformational change is greater in 1YTB than in PDT025, and this is reflected in the free energy cost for the transition: 11.8 kcal/mol for TATA and 8.1 kcal/mol for TAAA. The conclusion from these calculations was that the hydrogen bonds formed between these bases and the asparagine and threonine residues located at the dyad of TBP are responsible for driving the conformational transition [82]. Most interesting is that if TAAA is forced into the conformation in 1YTB, the energetic cost climbs to 14.4 kcal/mol. This is again consistent with the behavior depicted in the histograms in Figure 6 in that AA steps have a narrower twist prof'fle than AT steps, indicating that the population of unwound structures is smaller for AA than for AT steps. It is most noteworthy that the MD simulations leading to the results in Figure 6 were done with the CHARMM23 potential [93], while the PMF calculations were done with the AMBER 4.0 potential [94]. The coincidence in the conclusions derived from the simulations done with these different forcefields lends further credence to the inferences from these complex calculations.
2.3. The dehydration of the interface In section 2.1 the contact interface between TBP and the minor groove of DNA was characterized as anhydrous. This is a common characteristic in all the TBP-DNA complexes available to date. As TBP presents a primarily hydrophobic surface to DNA, most of the hydrogen bond donors and acceptors at this surface are not satisfied by the complexation. Hence, there is likely to be an enthalpic penalty associated with the dehydration of this surface. This penalty is compensated by the favorable increase in entropy associated with the liberation of the surface-bound water molecules into bulk solution. Following this reasoning, there are two aspects of hydration that could contribute to the determination of sequence specificity: the ideal sequence would be one which coordinates a large number of water molecules, but binds them least tightly. We carried out an extensive analysis of the hydration properties of DNA in the simulations M L P and I (see Table 3) [ 114], based on the proximity analysis developed in Beveridge's group [115,116] and implemented and improved by Mihaly Mezei [117]. The idea behind the proximity analysis is to partition the space surrounding DNA by placing bisectors along each bond in the molecule. In this manner, a collection of cells is generated (akin to Voronoi polyhedra) which can be ascribed to each atom of DNA for each snapshot of a simulation. Water molecules are assigned to each particular atom if they fall in the
397 appropriate volume elements. Properly normalized radial distribution functions can then be calculated from the number of water molecules in each cell and the corresponding volume, and with these functions, primary and secondary hydration shells can be detected. The detailed analysis is reported in [114]. In summary, the number of water molecules coordinated in the first shell by both dodecamers is practically the same, but there are important differences in the number of water molecules contained in the first and second hydration shells in the minor groove of the TATA (or TITI) elements. From the analysis, it appears that the grooves have very similar numbers of water molecules in the first shell, but the groove in M L P widens near the sugar-phosphate backbone and is capable of hosting more water molecules than I in that region. The interaction energies between the water molecules in the first shell and the atoms in the minor groove are practically the same for both dodecamers. This is expected because the chemical identity of this surface is the same in these two sequences. On the other hand, M L P has 19 more water molecules than I in the minor groove, and this represents a great entropic advantage for M L P . Furthermore, if both hydration shells are restricted in mobility compared to bulk water, there should be a measurable difference in the heat capacity change upon binding to M L P compared to binding to I. This prediction can be tested experimentally. Such a determination of the heat capacity change for these two systems would actually help to define the extent of water perturbation caused by the minor groove surface.
2.4. Integration of the various contributions into mechanistic criteria for the formation of TBP-DNA complexes The final energy balance that determines the relative affinities of TBP to different DNA sequences is composed of the various contributions to selectivity analyzed above in an individual fashion. They apply simultaneously, either constructively or in opposite directions. The experimental measures of binding constants [107] reflect this final ~ energy balance. According to these experimental data, the best TBP-binding sequences are represented by AT, E4, CYC1 and 6T. A common feature of these TBP binding sites is that they have at least six bp of alternating YR sequence. The detailed considerations discussed above indicate that this family of sequences is special because of its combination of static wedges at the appropriate positions (RY steps tend to have positive roll and low twist on average) and the high flexibility for YR steps that exhibit a mild anisotropic bendability towards the major groove. The fact that these sequences have the highest affinity for TBP suggests that DNA flexibility is a dominant characteristic in determining the specificity of binding. Note moreover, that because GC sequences have been found to have high flexibility of this kind, it may be possible to design a TBP mutant which can bind to GC, after judiciously eliminating all the steric clashes to the guanine amino group.
398 Following in the binding affinity ranking are the M L P , M L P 2 and CYC1B sequences, which also contain an alternating YR sequence that is followed, however, by a purine tract. In this same family one could also fit R28 (which is an inverted version of MLP), as well as 2C, 7G, and I. As shown in Figure 6, the purine tract is more rigid than the alternating YR region, and this could well account for the decrease in affinity. On the other hand, the purine tracts tend to lock into an A-DNA conformation which is assumed to be on a productive pathway leading to TBP binding. Thus, energy considerations would indicate that part of the work necessary to achieve the conformation found in the complex is already done for these sequences, and the entropy involved in locking the sugar tings in the North conformation is already paid for. The sequences in this group tend to have the highest structural homology to A-DNA of the collection of sequences studied by us, and their affinity to TBP could be rationalized on this basis. (Note that a recent report analyzing the thermodynamics of TBP-DNA complex formation proposes an initial intermediate that has an A-DNA conformation that subsequently isomerizes by the insertion of the two pairs of phenylalanine residues [60]). A comparison of the flexibility properties of adenine tracts (found in M L P ) and inosine tracts (found in I) indicated that the former are more rigid than the latter, making I a better substrate for TBP than M L P . Nevertheless, as discussed in the previous section, M L P coordinates more water molecules in the minor groove than I, compensating in excess the difference in rigidity of the purine tracts. In this particular comparison, hydration would appear to be a more important selectivity determinant than DNA flexibility.
3. D Y N A M I C E F F E C T S IN C O M P L E X S T A B I L I Z A T I O N The available crystal structures of TBP-DNA complexes agree in presenting a very tight and rigid interface between TBP and the minor groove of DNA. The comparison of the conformation of TBP in its free and bound forms [22,24] showed that there was a very modest structural rearrangement in TBP (a 5 ~ rotation of one subdomain respect to the other and a reduction in stirrup-stirrup distance), compared to the drastic conformational change in DNA. These considerations based on the crystal structures have resulted in centering all the attention on the changes in DNA during binding, with the implication that TBP is a passive and rigid restraint. The mechanistic analysis is now complemented by molecular dynamics simulations of TBP-DNA complexes that have enabled the study of the role of TBP dynamics in DNA recognition. Miaskiewicz and Ornstein carried out a 400 ps simulation of PDT025, with and without DNA [118], with the AMBER 4.0 and the Weiner et al. potential [ 119], where they identified a bending and twisting motion of the two subdomains of TBP. This motion has been invoked in subsequent articles as
399 part of the mechanism employed by TBP to open the minor groove of the TATA box [60,86]. They also found that TBP makes contacts to bases immediately 3' of the TATA box. Three very important residues, N99, L54 and L145, do not stick to the original base where they were found in the X-ray structure, but they wobble between two adjacent bases and their sugars. Our 2 ns CHARMM simulations of PDT025, 1YTB and 1AIS (without TFB), and the corresponding free TBPs, serve to reexamine the inferences on the nature and role of TBP dynamics with results from simulations of additional systems that are more completely described (e.g., Miaskiewicz and Ornstein did not include the internal water molecules in the TBP structure, while our new calculations do include them [ 120]). In these new simulations we do not detect evidence for the collapse of the two subdomains of TBP caused by extreme bending. There are indeed oscillations in the distance between the tips of the stirrups of TBP [101] and twisting motions, but these oscillations never acquire such an amplitude as to cause the complete closing of the underside of TBP. This is true for the three free TBP crystal structures simulated as monomers (NP-unpublished results). The striking result of the simulations of these complexes is the wealth of dynamics they have revealed, ranging from global motions to the spectrum of populations of side chain rotamers. For example, the hydrogen bonds located at the center of the TATA box, which had been studied with quantum mechanical methods (see section 2.1, above) and found to be indistinguishable in strength between the two possible AoT bp, are shown from the dynamics data to be very labile as a result of a competition for the amino hydrogens of N9 and N99, between the rim of the bases and neighboring H-bond TBP acceptors (the alcohol oxygens of T64 and T155). The extent of competition depends on the actual DNA sequence at that step, and is therefore different in the three simulated complexes. This finding makes TBP an active partner in the formation of the complex, and adds another layer of complexity to the analysis of specificity determinants. Experimental validation for the importance we are beginning to place on TBP dynamics comes from the prediction of the OH radical footprint pattern of SCE bound to MLP (Table 1). The predictions were based on solvent accessibility, and compared the X-ray structure PDT025 to results from trajectories of a molecular dynamics simulation of the same complex. The comparison is valid although there is no crystal structure for the particular combination of TBP and DNA sequence, because the residues involved in binding to DNA are 100% conserved between ATH and SCE. The key findings from this comparison are that both the crystal structures and a single average structure from the simulation failed to predict the correct reactivity at four nucleotides of the TATA box. Rather, the correct reactivity pattern is produced only when the fluctuations in the structure of TBP are taken into account through the analysis of the solvent accessibility in the MD trajectory (Pastor et al., in preparation).
400 The extensive pattern of dynamics that emerges from our simulations of TBP as an important contributor to complex formation is evident as well from the first structure of TBP obtained by NMR. The collection of 25 structures of a complex between TBP and an N-terminal domain of the largest TAFII in D.melanogaster, listed in Table 1 as 1TBA, provides evidence for the dynamic properties of both TBP and the TAF fragment from the analysis of the different rotamers populated in the ensemble of structures. Such local variability in structure also obtains for the side chains at the recognition interface, lending further credibility to the results obtained in the molecular dynamics simulations described above.
4. T O W A R D S THE PREINITIATION COMPLEX A S S E M B L Y The formation of a functional preinitiation complex is heavily regulated [6,15], in keeping with the fact that most of the regulation of gene expression occurs at the level of transcription initiation. As mentioned in the introduction, TBP exists in the cell in association with TAFs, which are absolutely necessary to respond to most of the activators characterized so far. TBP on its own can promote transcription, but only at a basal level. The footprint generated at a core promoter in vitro by TFIID depends on the kind of TATA box present: if it is a very efficient TATA box (such as MLP), then the footprint will cover the TATA box and a few nearby bp [121]. On the other hand, for a poor TATA box the footprint may extend many bp upstream and downstream [122,123]. This has been interpreted based on the assumption that TAFs have some DNA binding activity towards the initiator and a downstream promoter element, and that these interactions help to stabilize the association to DNA when the TATA box sequence deviates significantly from the consensus [11 ]. In this respect, TFIIA is also known to stabilize TBP-DNA complexes, and also to compete away proteins which inhibit the ability of TBP to bind to DNA [52,124,125]. The reported TBP-DNA structures are almost invariant in the geometry of the complex; nevertheless, there is evidence from gel retardation assays that the geometry of the complex in solution is strongly dependent on the TATA box sequence [47,126]. This is also relevant to understanding sequence specificity, because the difference in geometry is related to the stability of the complex and its ability to be recognized by TFIIB and the rest of the transcription machinery [126]. A preliminary analysis of the simulations of three different TBP-DNA complexes suggests that indeed, TBP can make adjustments in its structure to respond to the different dynamic properties of the bound DNA sequence [ 120], resulting in different angles between the incoming and outgoing segments of DNA. It is not immediately clear how to translate these alterations in complex structure to the disposition of TAFs and TFIIs around TBP. To
401 address this issue, ongoing work in our laboratories aims to characterize the differences in relative positions of side chains that have been identified by mutagenesis to be involved in contacts with the TAFs and TFI:S, as a response to the interaction with different DNA sequences.
5. CONCLUDING REMARKS
The analysis of the different factors that contribute to sequence specific recognition of TATA boxes by TBP leads to a complex picture of their interplay in determining the final binding affinity. Steric repulsion remains the strongest selectivity filter, effectively biasing TATA box composition to exclude GoC bp. For the remaining AoT rich sequences, DNA flexibility appears to be the next most important factor, as the best TBP binding sequences are those which are the most flexible, (i.e., sequences include many pyrimidinepurine bps). For sequences with adenine tracts, the penalty in loss of flexibility is apparently balanced by their propensity for adopting the A-DNA like structure that has been proposed to be an intermediate in the process of TBP binding. Further distinction among the TBP-binding sequences is achieved by differential hydration of the minor groove surface, which must be completely dehydrated to form a stable complex. Thus, we found that differences in the number of bound water molecules can offset differences in flexibility. A key contribution of molecular dynamics simulations to the understanding of mechanisms of selectivity and affinity in TBP-DNA complexes is the discovery of the active role of TBP in the formation of the complex. The view derived from crystal structures was that of a passive role for the TBP which only imposed a steric constraint on DNA shape. It appears now from the simulations that TBP can respond to the dynamics of the bound DNA sequence by adjusting its interdomain geometry, and this might be relevant for the construction of the final preinitiation complex. Furthermore, many of the contacts characterized in the crystal structures were found in the simulations to have an important dynamic component, as side chains switch rotamers rather frequently. This conformational freedom makes it possible for TBP to achieve suitable binding contacts with a variety of DNA moieties in a dynamic mode which contributes to enthalpic stabilization. However, the extent of preservation of side chain dynamics in the complex is dependent on the local structure. As it reduces the entropy loss upon complex formation, it provides an additional source of sequence-dependent gain in affinity that is revealed for the first time from the results of the molecular dynamics simulations.
402 6. A C K N O W L E D G E M E N T S W e thank Dr. L e o n a r d o Pardo for sustained discussions and collaborations on this topic. S o m e of the s i m u l a t i o n s r e p o r t e d here w e r e p e r f o r m e d at the Direcci6n General de Servicios de C6mputo Acad6mico ( U N A M ) .
REFERENCES .
.
.
.
.
6. .
.
.
10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20.
T. Rowlands, P. Baumann and S.P. Jackson. The TATA-binding protein: a general transcription factor in eukaryotes and archaebacteria. Science 264 (1994) 1326-9. S.K. Burley and R.G. Roeder. Biochemistry and structural biology of transcription factor 13I) (TFIID). Annu. Rev. Biochem. 65 (1996) 769-99. LB.P. Cormack and K. Struhl. The TATA-binding protein is required for transcription by all three nuclear RNA polymerases in yeast cells. Cell 69 (1992) 685-96. J.A. Goodrich and R. Tjian. TBP-TAF complexes: selectivity factors for eukaryotic transcription. Curr. Opin. Cell. Biol. 6 (1994)403-9. K. Struhl. Duality of TBP, the universal transcription factor. Science 263 (1994) 1103. R.G. Roeder. The role of general initiation factors in transcription by RNA polymerase II. Trends Biochem. Sci. 21 (1996) 327-35. R. Breathnach and P. Chambon. Organization and expression of eucaryotic split genes coding for proteins. Annu. Rev. Biochem. 50 (1981) 349-83. B.C. Hoopes, J.F. LeBlanc and D.K. Hawley. Contributions of the TATA box sequence to rate-limiting steps in transcription initiation by RNA polymerase II. J. Mol. B iol. 277 (1998) 1015-31. X.Y. Li, A. Virbasius, X. Zhu and M.R. Green. Enhancement of TBP binding by activators and general transcription factors. Nature 399 (1999) 605-9. L. Kuras and K. Struhl. Binding of TBP to promoters in vivo is stimulated by activators and requires Pol II holoenzyme. Nature 399 (1999) 609-13. T.W. Burke, P.J. Willy, A.K. Kutach, J.E. Butler and J.T. Kadonaga. The DPE, a conserved downstream core promoter element that is functionally analogous to the TATA box. Cold Spring Harb. Symp. Quant. Biol. 63 (1998) 75-82. S. Buratowski, S. Hahn, L. Guarente and P.A. Sharp. Five Intermediate Complexes in Transcription Initiation by RNA Polymerase II. Cell 56 (1989) 549-561. A.J. Koleske and R.A. Young. The RNA polymerase II holoenzyme and its implications for gene regulation. Trends Biochem. Sci. 20 (1995) 113-6. J.A. Ranish, N. Yudkovsky and S. Hahn. Intermediates in formation and activity of the RNA polymerase II preinitiation complex: holoenzyme recruitment and a postrecruitment role for the TATA box and TFIIB. Genes Dev. 13 (1999) 49-63. G. Orphanides, T. Lagrange and D. Reinberg. The general transcription factors of RNA polymerase II. Genes Dev. 10 (1996) 2657-83. J.C. Dantonel, J.M. Wurtz, O. Poch, D. Moras and L. Tora. The TBP-like factor: an alternative transcription factor in metazoa? Trends Biochem. Sci. 24 (1999) 335-9. P.A. Moore, et al. A human TATA binding protein-related protein with altered DNA binding specificity inhibits transcription from multiple promoters and activators. Mol. Cell Biol. 19 (1999) 7610-20. B.S. DeDecker, et al. The crystal structure of a hyperthermophilic archaeal TATA-box binding protein. J. Mol. Biol. 264 (1996) 1072-84. D.B. Nikolov, et al. Crystal structure of TFHD TATA-box binding protein. Nature 360 (1992) 40-6. D.B. Nikolov and S.K. Burley. 2.1 A resolution refined structure of a TATA box-
403
21. 22. 23. 24. 25. 26. 27. 28. 29.
3o. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45.
binding protein (TBP). Nature Struct Biol 1 (1994) 621-37. D.I. Chasman, K.M. Flaherty, P.A. Sharp and R.D. Kornberg. Crystal structure of yeast TATA-binding protein and model for interaction with DNA. Proc. Natl. Acad. Sci. USA 90 (1993) 8174-8. J.L. Kim, D.B. Nikolov and S.K. Burley. Co-crystal structure of TBP recognizing the minor groove of a TATA element. Nature 365 (1993) 520-7. Y. Kim, J.H. Geiger, S. Hahn and P.B. Sigler. Crystal structure of a yeast TBP/TATAbox complex. Nature 365 (1993) 512-20. J.L. Kim and S.K. Burley. 1.9 A resolution refined structure of TBP recognizing the minor groove of TATAAAAG. Nature Struct Biol 1 (1994) 638-53. D.B. Nikolov, et al. Crystal structure of a human TATA box-binding proteinffATA element complex. Proc. Natl. Acad. Sci. USA 93 (1996) 4862-7. Z.S. Juo, et al. How proteins recognize the TATA box. J. Mol. Biol. 261 (1996) 239. D.B. Nikolov, et al. Crystal structure of a TFIIB-TBP-TATA-element ternary complex. Nature 377 (1995) 119-28. S. Tan, Y. Hunziker, D.F. Sargent and T.J. Richmond. Crystal structure of a yeast TFIIA/TBP/DNA complex. Nature 381 (1996) 127-51. J.H. Geiger, S. Hahn, S. Lee and P.B. Sigler. Crystal structure of the yeast TFIIA/TBP/DNA complex. Science 272 (1996) 830-6. P.F. Kosa, G. Ghosh, B.S. DeDecker and P.B. Sigler. The 2.1-A crystal structure of an archaeal preinitiation complex: TATA- box-binding protein/transcription factor (II)B core/TATA-box. Proc. Natl. Acad. Sci. USA 94 (1997) 6042-7. O. Littlefield, Y. Korkhin and P.B. Sigler. The structural basis for the oriented assembly of a TBP/TFB/promoter complex. Proc. Natl. Acad. Sci. USA 96 (1999) 13668-73. F.T. Tsai and P.B. Sigler. Structural basis of preinitiation complex assembly on human Pol II promoters. EMBO J. 19 (2000) 25-36. C.A. Bewley, A.M. Gronenbom and G.M. Clore. Minor groove-binding architectural proteins: structure, function, and DNA recognition. Annu. Rev. B iophys. B iomol. Struct. 27 (1998) 105-31. D. Liu, et al. Solution structure of a TBP-TAF(II)230 complex: protein mimicry of the minor groove surface of the TATA box unwound by TBP. Cell 94 (1998) 573-83. F.C. Bernstein, et al. The Protein Data Bank: a computer-based archival file f o r macromolecular structures. J. Mol. Biol. 112 (1977) 535-42. H.M. Berman, et al. The nucleic acid database. A comprehensive relational database of three-dimensional structures of nucleic acids. Biophys. J. 63 (1992) 751-9. B. Coulombe, J. Li and J. Greenblatt. Topological localization of the human transcription factors IIA, IIB, TATA box-binding protein, and RNA polymerase IIassociated protein 30 on a class II promoter. J Biol Chem 269 (1994) 19962-7. J.D. Griff'lth, S. Lee and Y.-H. Wang. Visualizing nucleic acids and their complexes using electron microscopy. Curr. Opin. Struct. Biol. 7 (1997) 362-6. T.K. Kim, et al. Trajectory of DNA in the RNA polymerase II transcription preinitiation complex. Proc. Natl. Acad. Sci. USA 94 (1997) 12268-73. T. Lagrange, et al. High-resolution mapping of nucleoprotein complexes by site-specific protein-DNA photocrosslinking: organization of the human TBP-TFIIA- TFIIB-DNA quaternary complex. Proc. Natl. Acad. Sci. USA 93 (1996) 10620-5. G.J. Jensen, G. Meredith, D.A. Bushnell and R.D. Kornberg. Structure of wild-type yeast RNA polymerase II and location of Rpb4 and Rpb7. EMBO J. 17 (1998) 2353-8. X. Xie, et al. Structural similarity between TAFs and the heterotetrameric core of the histone octamer. Nature 380 (1996) 316-22. W. Zhu, et al. The N-terminal domain of TFIIB from Pyrococcus furiosus forms a zinc ribbon. Nature Struct. Biol. 3 (1996) 122-4. S. Bagby, et al. Solution structure of the C-terminal core domain of human TFIIB: similarity to cyclin A and interaction with TATA-binding protein. Cell 82 (1995) 857-67. M.E. Noble, J.A. Endicott, N.R. Brown and L.N. Johnson. The cyclin box fold:
404
46. 47. 48. 49. 50. 51. 52. 53. 54.
protein recognition in cell-cycle and transcription control. Trends Biochem. Sci. 22 (1997) 482-7. B.C. Hoopes, J.F. LeBlanc and D.K. Hawley. Kinetic analysis of yeast TFIID-TATA box complex formation suggests a multi-step pathway. J Biol Chem 267 (1992) 11539. D.B. Starr, B.C. Hoopes and D.K. Hawley. DNA bending is an important component of site-specific recognition by the TATA binding protein. J. Mol. Biol. 250 (1995) 434. J.M. Cox, A.R. Kays, J.F. Sanchez and A. Schepartz. Preinitation complex assembly: potentially a bumpy path. Curr. Opin. Chem. Biol. 2 (1998) 11-7. R.A. Coleman, A.K. Taggart, L.R. Benjamin and B.F. Pugh. Dimerization of the TATA binding protein. J. Biol. Chem. 270 (1995) 13842-9. A.K. Taggart and B.F. Pugh. Dimerization of TFIID when not bound to DNA. Science 272 (1996) 1331-3. R.A. Coleman and B.F. Pugh. Slow dimer dissociation of the TATA binding protein dictates the kinetics of DNA binding. Proc. Natl. Acad. Sci. USA 94 (1997) 7221-6. R.A. Coleman, A.K. Taggart, S. Burma, J.J.n. Chicca and B.F. Pugh. TFILA regulates TBP and TFIID dimers. Mol. Cell 4 (1999) 451-7. A.J. Jackson-Fisher, C. Chitikila, M. Mitra and B.F. Pugh. A role for TBP dimerization in preventing unregulated gene expression. Mol. Cell 3 (1999) 717-27. A.J. Jackson-Fisher, et al. Dimer dissociation and thermosensitivity kinetics of the Saccharomyces cerevisiae and human TATA binding proteins. Biochemistry 38 (1999) 11340-8.
55. 56. 57. 58. 59.
60. 61. 62. 63. 64. 65. 66. 67.
R.A. Coleman and B.F. Pugh. Evidence for functional binding and stable sliding of the TATA binding protein on nonspecific DNA. J. Biol. Chem. 270 (1995) 13850-9. V. Petri, M. Hsieh and M. Brenowitz. Thermodynamic and kinetic characterization of the binding of the TATA binding protein to the adenovirus E4 promoter. Biochemistry 34 (1995) 9977-84. G.M. Perez-Howard, P.A. Weil and J.M. Beechem. Yeast TATA binding protein interaction with DNA: fluorescence determination of oligomeric state, equilibrium binding, on-rate, and dissociation kinetics. Biochemistry 34 (1995) 8005-17. K.M. Parkhurst, M. Brenowitz and L.J. Parkhurst. Simultaneous binding and bending of promoter DNA by the TATA binding protein: real time kinetic measurements. Biochemistry 35 (1996) 7459-65. V. Petri, M. Hsieh, E. Jamison and M. Brenowitz. DNA sequence-specific recognition by the Saccharomyces cerevisiae "TATA" binding protein: promoter-dependent differences in the thermodynamics and kinetics of binding. Biochemistry 37 (1998) 15842-9. K.M. Parkhurst, R.M. Richards, M. Brenowitz and L.J. Parkhurst. Intermediate species possessing bent DNA are present along the pathway to formation of a final TBPTATA complex. J. Mol. Biol. 289 (1999) 1327-41. J.D. Parvin, R.J. McCormick, P.A. Sharp and D.E. Fisher. Pre-bending of a promoter sequence enhances aff'mity for the TATA-binding factor. Nature 373 (1995) 724-7. A. Grove, A. Galeone, E. Yu, L. Mayol and E.P. Geiduschek. Affinity, Stability and Polarity of Binding of the TATA Binding Protein Governed by Flexure at the TATA Box. J. Mol. Biol. 282 (1998) 731-739. M.T. Record Jr, T.M. Lohman and P. de Haseth. Ion effects on ligand-nucleic acid interactions. J Mol Biol 107 (1976) 145-158. G.S. Manning. The molecular theory of polyelectrolyte solutions with applications to the electrostatic properties of polynucleotides. Quart. Rev. Biophys. 11 (1978) 179-246. R.S. Spolar and M.T.J. Record. Coupling of local folding to site-specific binding of proteins to DNA. Science 263 (1994) 777-84. J.M. Sturtevant. Heat capacity and entropy changes in processes involving proteins. Proc. Natl. Acad. Sci. USA 74 (1977) 2236-40. N.C. Seeman, J.M. Rosenberg and A. Rich. Sequence-specific recognition of double helical nucleic acids by proteins. Proc. Natl. Acad. Sci. USA 73 (1976) 804-8.
405
68. 69. 70. 71. 72. 73. 74. 75. 76. 77. 78. 79.
80. 81. 82. 83. 84. 85. 86. 87. 88. 89.
C.L. Kielkopf, et al. A structural basis for recognition of A.T and T.A base pairs in the minor groove of B-DNA. Science 282 (1998) 111-5. K.M. Amdt, S.L. Ricupero, D.M. Eisenmann and F. Winston. Biochemical and genetic characterization of a yeast TFIID mutant that alters transcription in vivo and DNA binding in vitro. Mol Cell Biol 12 (1992) 2372-82. K.M. Arndt, C.R. Wobbe, H.S. Ricupero, K. Struhl and F. Winston. Equivalent mutations in the two repeats of yeast TATA-binding protein confer distinct TATA recognition specificities. Mol. Cell. Biol. 14 (1994) 3719-28. M. Strubin and K. Struhl. Yeast and human TFIID with altered DNA-binding specificity for TATA elements. Cell 68 (1992) 721-30. M.C. Schultz, R.H. Reeder and S. Hahn. Variants of the TATA-binding protein can distinguish subsets of RNA polymerase I, II, and I~ promoters. Cell 69 (1992) 697. D. Poon, et al. Genetic and biochemical analyses of yeast TATA-binding protein mutants. J B iol Chem 268 (1993) 5005-13. H. Tang, X. Sun, D. Reinberg and R.H. Ebright. Protein-protein interactions in eukaryotic transcription initiation: structure of the preinitiation complex. Proc. Natl. Acad. Sci. USA 93 (1996) 1119-24. G.O. Bryant, L.S. Martel, S.K. Burley and A.J. Berk. Radical mutations reveal TATAbox binding protein surfaces required for activated transcription in vivo. Genes Dev. 10 (1996) 2491-504. P. Reddy and S. Hahn. Dominant negative mutations in yeast TFIID define a bipartite DNA-binding region. Cell 65 (1991) 349-57. T. Yamamoto, et al. A bipartite DNA binding domain composed of direct repeats in the TATA box binding factor TFIID. Proc Natl Acad Sci U S A 89 (1992) 2844-8. W.P. Tansey, S. Ruppert, R. Tjian and W. Herr. Multiple regions of TBP participate in the response to transcriptional activators in vivo. Genes Dev 8 (1994) 2756-69. S.K. Mahanta, T. Scholl, F.C. Yang and J.L. Strominger. Transactivation by CIITA, the type II bare lymphocyte syndrome- associated factor, requires participation of multiple regions of the TATA box binding protein. Proc. Natl. Acad. Sci. USA 94 (1997) 6324-9. Y. Cang, D.T. Auble and G. Prelich. A new regulatory domain on the TATA-binding protein. EMBO J. 18 (1999) 6662-6671. N. Pastor and H. Weinstein. Electrostatic analysis of DNA binding properties in lysine to leucine mutants of TATA-box binding proteins. Protein Eng. 8 (1995) 543-9. L. Pardo, M. Campillo, D. Bosch, N. Pastor and H. Weinstein. Binding mechanisms of TATA box-binding proteins: DNA kinking is stabilized by specific hydrogen bonds. Biophys. J. in press (2000). L. Pardo, N. Pastor and H. Weinstein. Selective binding of the TATA box-binding protein to the TATA box-containing promoter: analysis of structural and energetic factors. Biophys. J. 75 (1998) 2411-21. G. Guzikevich-Guerstein and Z. Shakked. A novel form of the DNA double helix imposed on the TATA-box by the TATA-binding protein. Nature Struct. Biol. 3 (1996) 32-7. L. Pardo, N. Pastor and H. Weinstein. Progressive DNA bending is made possible by gradual changes in the torsion angle of the glycosyl bond. Bi0phys. J. 74 (1998) 2191. A. Lebrun, Z. Shakked and R. Lavery. Local DNA stretching mimics the distortion caused by the TATA box- binding protein. Proc. Nail. Acad. Sci. USA 94 (1997) 2993. A. Lebrun and R. Lavery. Modeling DNA deformations induced by minor groove binding proteins. Biopolymers 49 (1999) 341-53. A.H. Elcock and J.A. McCammon. The low dielectric interior of proteins is sufficient to cause major structural changes in DNA on association. J. Amer. Chem. Soc. 118 (1996) 3787-3788. N. Pastor, L. Pardo and H. Weinstein. Does TATA matter? A structural exploration of the selectivity determinants in its complexes with TATA box-binding protein. Biophys.
406
90. 91. 92. 93. 94. 95. 96. 97. 98.
-9. 100. 101. 102. 103. 104. 105.
106. 107. 108. 109. 110. 111. 112. 113.
J. 73 (1997) 640-52. D. Flatters, M. Young, D.L. Beveridge and R. Lavery. Conformational properties of the TATA-box binding sequence of DNA. J. Biomol. Struct. Dyn. 14 (1997) 757-65. D. Flatters and R. Lavery. Sequence-dependent dynamics of TATA-Box binding sites. Biophys. J. 75 (1998) 372-81. O.N. de Souza and R.L. Omstein. Inherent DNA curvature and flexibility correlate with TATA box functionality. Biopolymers 46 (1998) 403-15. A.D. MacKerell Jr, J. Wiorkiewicz-Kuczera and M. Karplus. An all-atom empirical energy function for the simulation of nucleic acids. J Am Chem Soc 117 (1995) 11946. W.D. Cornell, et al. A second generation force field for the simulation of proteins, nucleic acids, and organic molecules. J Am Chem Soc 117 (1995) 5179-5197. M. Feig and B.M. Pettitt. Structural equilibrium of DNA represented with different force fields. Biophys. J. 75 (1998) 134-49. M. Feig and B.M. Pettitt. Experiment vs force fields: DNA conformation from molecular dynamics simulations. J. Phys. Chem. B 101 (1997) 7361-3. N.A. Davis, S.S. Majee and J.D. Kahn. TATA Box DNA Deformation with and without the TATA Box-binding Protein. J. Mol. Biol. 291 (1999) 249-265. D.S. Goodsell and R.E. Dickerson. Bending and curvature calculations in B-DNA. Nucleic Acids Res. 22 (1994) 5497-503. W.L. Jorgensen, J. Chandrasekhar, J.D. Madura, R.W. Impey and M.L. Klein. Comparison of simple potential functions for simulating liquid water. J. Chem. Phys. 79 (1983) 926-35. N. Pastor, L. Pardo and H. Weinstein. in Molecular Modeling of Nucleic Acids (eds. Leontis, N.B. & SantaLucia Jr, J.) 329-45 (American Chemical Society, San Francisco, CA, 1997). N. Pastor. Ph.D. Thesis in Biomedical Sciences (CUNY, New York, 1997). D.K. Lee, K.C. Wang and R.G. Roeder. Functional significance of the TATA element major groove in transcription initiation by RNA polymerase II. Nucleic Acids Res. 25 (1997) 4338-45. S. ArnotL et al. Wrinkled DNA. Nucleic Acids Res. 11 (1983) 1457-1474. N.B. Ulyanov and T.L. James. Statistical analysis of DNA duplex structural features. Methods Enzymol. 261 (1995) 90-120. M. Tonelli, E. Ragg, A.M. Bianucci, K. Lesiak and T.L. James. Nuclear magnetic resonance structure of d(GCATATGATAG), d(CTATCATATGC): a consensus sequence for promoters recognized by sigma K RNA polymerase. Biochemistry 37 (1998) 11745-61. C.R. Wobbe and K. Struhl. Yeast and human TATA-binding proteins have nearly identical DNA sequence requirements for transcription in vitro. Mol. Cell. Biol. 10 (1990) 3859-67. J.M. Wong and E. Bateman. TBP-DNA interactions in the minor groove discriminate between A:T and T:A base pairs. Nucleic Acids Res 22 (1994) 1890-6. H. Sklenar, C. Etchebest and R. Lavery. Describing protein structure: a general algorithm yielding complete helicoidal parameters and a unique overall axis. Proteins 6 (1989) 46-60. M. Suzuki, M.D. Allen, N. Yagi and J.T. Finch. Analysis of co-crystal structures to identify the stereochemical determinants of the orientation of TBP on the TATA box. Nucleic Acids Res. 24 (1996) 2767-73. W.K. Olson. Simulating DNA at low resolution. Curr. Opin. Struct. Biol. 6 (1996) 242-56. J.M. Cox, et al. Bidirectional binding of the TATA box binding protein to the TATA box. Proc. Natl. Acad. Sci. USA 94 (1997) 13475-80. M. Suzuki, N. Yagi and J.T. Finch. Role of base-backbone and base-base interactions in alternating DNA conformations. FEBS Lett. 379 (1996) 148-52. R. O'Brien, B. DeDecker, K.G. Flenfing, P.B. Sigler and J.E. Ladbury. The effects of
407
114. 115. 116. 117. 118. 119. 120. 121. 122. 123. 124. 125. 126.
salt on the TATA binding protein-DNA interaction from a hyperthermophilic archaeon. J. Mol. Biol. 279 (1998) 117-25. N. Pastor, A.D. MacKerell, Jr. and H. Weinstein. TIT for TAT: the properties of inosine and adenosine in TATA box DNA. J. Biomol. Struct. Dyn. 16 (1999) 787-810. P.K. Mehrotra, F.T. Marchese and D.L. Beveridge. Statistical state solvation sites. J. Am. Chem. Soc. 103 (1981) 672-3. P.K. Mehrotra and D.L. Beveridge. Structural analysis of molecular solutions based on quasi-component distribution functions. Application to [H2CO]aq at 25 oC. J. Am. Chem. Soc. 102 (1980) 4287-94. M. Mezei and D.L. Beveridge. Structural chemistry of biomolecular hydration via computer simulation: the proximity criterion. Methods Enzymol. 127 (1986) 21-47. K. Miaskiewicz and R.L. Ornstein. DNA binding by TATA-box binding protein (TBP): a molecular dynamics computational study. J. Biomol. Struct. Dyn. 13 (1996) 593-600. S.J. Weiner, P.A. Kollman, D.T. Nguyen and D.A. Case. An all atom force field for simulations of proteins and nucleic acids. J Comp Chem 7 (1986) 230-252. N. Pastor and H. Weinstein. Sidechain dynamics and seuqence specific TBP binding to TATA box DNA. Biophys. J. 76 (1999) A387. M. Horikoshi, et al. Transcription factor TFUD induces DNA bending upon binding to the TATA element. Proc Natl Acad Sci U S A 89 (1992) 1060-4. Y. Nakatani, et al. A downstream initiation element required for efficient TATA box binding and in vitro function of TFIID. Nature 348 (1990) 86-8. P.A. Emanuel and D.S. Gilmour. Transcription factor TFIID recognizes DNA sequences downstream of the TATA element in the Hsp70 heat shock gene. Proc. Natl. Acad. Sci. USA 90 (1993) 8449-53. K.H. Emami, A. Jain and S.T. Smale. Mechanism of synergy between TATA and initiator: synergistic binding of TFIID following a putative TFIIA-induced isomerization. Genes Dev. 11 (1997) 3007-19. Q. Liu, S.E. Gabriel, K.L. Roinick, R.D. Ward and K.M. Arndt. Analysis of TFIIA Function In Vivo: Evidence for a Role in TATA-Binding Protein Recruitment and GeneSpecific Activation. Mol. Cell Biol. 19 (1999) 8673-8685. J. Bernues, P. Carrera and F. Azorin. TBP binds the transcriptionally inactive TA5 sequence but the resulting complex is not efficiently recognised by TFIIB and TFIIA. Nucleic Acids Res. 24 (1996) 2950-8.
This Page Intentionally Left Blank
L.A. Eriksson (Editor) Theoretical Biochemistry - Processes and Properties of Biological Systems Theoretical and Computational Chemistry, Vol. 9 9 2001 Elsevier Science B.V. All rights reserved
409
Chapter 11
A Multi-Component Model For Radiation Damage To DNA From Its Constituents Stacey D. W e t m o r e , a Left A. E r i k s s o n b and Russell J. B o y d a
aDepartment of Chemistry-, Dalhousie University, Halifax, Nova Scotia, Canada B3H 4J3 bDepartment of Quantum Chemistry, Uppsala University, Box 518, 751 20 Uppsala, Sweden
1. I N T R O D U C T I O N While the significance of radicals in biological systems has been appreciated for decades, there is relatively little defimtive experimental information on the identity of the radicals and even less on the mechanisms by which they affect the physiology of living systems. The paucity of detailed information is a direct consequence of the fact that most radicals are highly reactive and, therefore, short-lived transient species. Despite the tremendous advances in spectroscopic and laser photolysis techniques, much less is known about radicals than about closed-shen species. The treatment of radicals by theoretical methods is, however, only marginally more difficult than that of closed-shell molecules. It is for these reasons that the numerous applications of quantum chemical techniques to radicals have proven to be complementary to experimental studies. The large number of biologically important radicals and the even greater number of reactions that they undergo in vivo provide a limitless list of interesting problems. Many biological radicals are formed by exposure of living matter to ionizing radiation. More specifically, radiation causes damage to DNA, the primary products being base or sugar radicals that subsequently lead to strand breaks and DNA-DNA or DNA-protein cross-links. Interest in the effects of radiation on DNA has grown for several reasons. For example, the beneficial effects of radiation therapy are achieved through alterations to DNA, and there is
410
increasing concern about the exposure of the human population to higher levels of ultraviolet radiation due to the depletion of stratospheric ozone. It is extremely difficult to study the effects of radiation on DNA by direct experimental methods. In many cases there is much uncertainty about which radicals are the main radiation products. Due to complications associated with electron spin resonance (ESR) and electron nuclear double resonance (ENDOR) experiments on full DNA, the most accurate studies are available for single crystals of the four DNA bases and related derivatives. However, even these low-temperature spectra are complicated by the presence of significant hydrogen bonding in the crystal structures. Furthermore, due to structural similarities in the generated radicals, the spectra involve many overlapping peaks. Consequently, the experimental identification of specific radicals is difficult and often involves many assumptions. This is an ideal problem for which computational chemistry is a valuable complementary partner to experiment. Due to advances in quantum chemical methods (density-functional theory) and computer hardware, it is possible to accurately predict the hyperfine coupling constants (HFCC) of many biological radicals, the property used to identify radicals experimentally. The theoretical HFCCs can be used to assist with the interpretation of the results obtained from ESR and ENDOR experiments. This chapter has two objectives. First, we review recent progress in the computation of the HFCCs of radicals that may be formed from radiation damage to the four DNA bases, thymine [1], cytosine [2], guanine [3] and adenine [4], as well as the sugar moiety [5]. The theoretical values are compared with the most accurate data available from ENDOR and related experiments on single crystals. The good agreement between the computed and experimental values in many cases is used to validate the level of theory used for the HFCC computations. For a few cases where there are discrepancies between the two data sets, consideration of conformational changes due to hydrogen bonding and packing effects in the crystalline state, which are not accounted for in gas-phase calculations, leads to better agreement. For the small number of cases where the discrepancies between the computed and experimental results are not resolved by modifying the gas-phase structures, it is suggested that alternate assignments of the spectra, or new experiments, may be warranted. In one case, the discrepancy between the computed and experimental HFCCs has led to the proposal of a new mechanism for radiation damage. The second objective is to review the results of experimental studies on full DNA in the context of the computed and experimental results for single crystals. We conclude our chapter with a multi-component model for radiation damage to
411
DNA that includes damage to the bases, the sugar moiety, the phosphate group, and the surrounding water molecules. The model incorporates the results of many sophisticated experimental studies on full DNA and accounts for all known direct and indirect consequences of radiation damage to DNA. The model is expected to be useful for the design of new experiments and the characterization of the ESR and ENDOR spectra of DNA. A full knowledge of the radicals generated upon irradiation of DNA is essential for determining the type of damage at a molecular level which in turn governs the biological consequences (strand-breaks, tandem lesions, DNA-protein cross-links, unaltered base release, etc.). 2. C H A R A C T E R I Z A T I O N OF DNA RADIATION PRODUCTS
We have recently reported extensive calculations on all possible radicals formed by net hydrogen atom addition (hydrogenated), net hydrogen atom removal (dehydrogenated), or net hydroxyl radical addition (hydroxylated) to the four DNA bases, cytosine (C), thymine (T), guanine (G), and adenine (A) [1-4]. We have also studied all radicals formed by net hydrogen atom and net hydroxyl radical abstraction from a model of the sugar group present in DNA (deoxyribose (dR)), as well as sugar radicals formed through more extensive damage pathways [5]. The important information gained from the calculations includes the relative energetics of the products generated from each base through a similar mechanism, the spin density distributions and the HFCCs. The potential energy surfaces for possible radiation products were explored using Becke's three-parameter exchange functional (B3) [6] in combination with Lee, Yang and Parr's correlation expression (LYP) [7] and Pople's 6-31G(d,p) basis set [8]. Two sets of single-point calculations were performed on the global minima. First, the B3LYP hybrid functional and Pople's 6-311G(2df, p) basis set [8] were used to obtain relative energies and spin densities. Secondly, HFCCs were obtained using Perdew and Wang's nonlocal exchange (PW) [9], Perdew's nonlocal correlation functional (P86) [10], Pople's 6-311G(2d,p) basis set [8], and the (5,4;5,4) family of auxiliary basis sets for the fitting of the charge density and the exchange correlation potential. These calculations were carried out with the GAUSSIAN 94 [11] and deMon program packages [12]. The present combination of methods has been successfully employed in studies of model re-radicals [ 13]. Details of the calculation of HFCCs have been reviewed on several occasions and will not be discussed in detail within [14]. However, it is important to understand that the HFCC has two contributions: the isotropic component (Azso) and the anisotropic component (Txx, Trr, Tzz). The addition of Aiso to each
412
component of the anisotropic tensor results in the principal components (Axx, An,, Azz). The calculation of accurate isotropic HFCCs requires both a good description of electron correlation and a well-defined basis set. However, even if these computational demands are satisfied, theoretical results may deviate more than 20% from the experimental value. On the other hand, anisotropic HFCCs can be calculated accurately even with lower levels of theory. More importantly, the calculated anisotropic component of hydrogen HFCCs are often within 5-10% of the experimental value and the most abundant data available for biological systems are hydrogen couplings. Thus, comparison of anisotropic hyperfine tensors can be used as an accurate guide to identify radical sites even when less satisfactory agreement is obtained for the isotropic component. The unit used for the HFCCs throughout is gauss, which is related to megahertz through a simple conversion factor (1 G = 2.8025 MHz). The atomic numbering in the nucleobases used throughout is shown in Figure 1 and a few examples of the notation used for DNA radicals will now be given. The cytosine anion and guanine cation are denoted as C ' - a n d G "+. A radical formed by net addition of OH to C5 in cytosine or thymine is denoted by C(C5OH) and T(C5OI-I), respectively. Similarly, radicals formed by net hydrogen atom addition to N3 or C6 in adenine and thymine, respectively, are referred to as A(N3tD and T(C6H). The radicals formed via net hydrogen atom
/CH3 I
""
CH;OH..
II
I
I
""
R
II
:OH
4'
I
r
"oH
H
R
T
C
dR
~
{
l{
Cs--H
l
11
l,~ 1
H2N
t-. 5
N3
R
A
.C~--n
\N
\ R
G
Figure 1: Structure and chemical numbering in the four DNA bases (R = H, thymine, cytosine, adenine and guanine) and the sugar group (deoxyribose).
413
removal from the methyl group in thymine or the amino group in guanine are denoted by T(Ctt2) and G(N2tt), respectively. Some guanine and adenine crystals examined experimentally lead to protonated radicals, such as the protonated N6-dehydrogenated adenine radical [A(N61t+)] or the protonated guanine C8-hydrogenated radical [G(C8H+)]. The radicals formed via net hydrogen abstraction from the C5' or 03' position in the sugar group are referred to as C5" or 0 3 '~ respectively. The north and south puckering modes (Section 2.3) for the C3 '~ radical will be distinguished as C3'~ and C3'~ respectively. The notation of more complex sugar radicals will be discussed as required. Some of the results obtained for the numerous radicals investigated will be presented in the remainder of this section. The discussion will be separated into base and sugar radicals. The former will be further divided into radicals where theory and experiment are in good agreement, those where external influences must be considered in order to obtain agreement between theory and experiment, and, finally, those where consideration of external influences does n o t aid the poor agreement between theory and experiment. The examples given within were chosen to illustrate the level of agreement with experiment and the type of complementary information that can be obtained from the calculations. For full details on the computations the interested reader is referred to the original series of theoretical papers [1-5]. Only limited experimental data are presented herein, as most experiments gave similar results. For a complete list of experimental papers, please refer to the original theoretical work and/or an excellent review covering experimental work until 1993 [ 15].
2.1 Pyrimidine and purine radiation products" close agreement between experiment and theory Typical theoretical results obtained for a wide variety of base radicals will be represented through a discussion of the results for five radicals (Table 1). The first data block in Table 1 represents the experimental and theoretical HFCCs obtained for the radical formed through net hydrogen atom addition to C5 in
I
II
Figure 2: Pseudo-axial (I) and pseudo-equatorial (II) T(C5H) conformations.
414
Table 1" Theoretical and experimental HFCCs (G) in radicals for which good agreement between the two data sets is observed. Theory Experimenta Radical Atom Ai~o To: Trr Tzz Ai~o Txx TrY Tzz T(C5H) C6H - 1 5 . 9 -11.2 -0.2 11.4 -19.2 -11.2 1.0 10.2 C5H 41.9 - 1 . 5 -1.1 2.5 48.6 - 1 . 7 -0.7 2.4 T(CH2)
A(NrH)
NIH -2.5 N3H 0.1 C6H -11.4 C5-CH -15.1 C5-CH -14.1 N6H C8H
G(C5H) C5H
-1.8
-1.1
-0.4 0.1 -5.4 -0.7 -8.9 -0.1 -8.1 -0.5
2.9 0.4 -1.0 6.1 -10.7 9.0 -16.4 8.6 -15.7
- 1 1 . 8 -9.7 -2.0 11.8 -11.5 -4.0 - 2 . 3 -0.3 2.6 - 4 . 6 49.5
- 0 . 7 -0.5
1.2
54.0
-0.9 -4.8 -9.0 -8.1
-0.8 -0.4 0.6 0.3
1.7 5.3 8.4 7.9
-8.3
-1.2
-2.4 -0.2
9.4 2.6
-1.0 -0.5
1.7
G(N21-I) N2H -7.6 -6.6 - 1.2 7.7 - 9 . 6 - 6 . 9 -0.9 7.8 C8H -6.0 -3.4 -0.3 3.7 - 4 . 9 - 2 . 6 -0.2 2.9 "References for experimental data: T(C51-1) and T(CH2) reference 17; A(N6H) reference 20; G(C5I-I) and G(N2H) reference 21. thymine [T(C5H)], which displays interesting geometrical effects. This radical is distorted at C5 while the rest of the ring remains planar leading to two possible orientations for the additional hydrogen atom ~ pseudo-axial and pseudo-equatorial (Figure 2). The radical with hydrogen in the pseudo-axial position, almost perpendicular to the molecular plane, is slightly lower in energy (0.7 kcal/mol). This agrees with the most stable conformation observed for 5,6-dihydrothymine [16]. A spin density of 0.79 at C6 leads to a large isotropic coupling for the out-of-plane C5H in the pseudo-axial position (41.9 G) and a smaller coupling for C6H (-15.9 G). These calculated couplings match well with the experimental predictions where a large coupling was assigned to a fl-hydrogen orientated in a position perpendicular to the thymine base [17]. A second coupling was experimentally assigned to the hydrogen at the C6 position, and the spin density at C6 was predicted to be 0.75, in good agreement with experiment. The isotropic couplings in T(C51-1) with the added hydrogen in the pseudo-equatorial position (C5H = 16.0 G; C6H = -16.4 G) confirm that the hydrogen at C5 in the observed radical is in a pseudo-axial position. This example illustrates the additional geometrical information that can be obtained from the calculations. The C5-methyl dehydrogenated thymine radical [T(CH2)] has been observed in almost every ESR study on thymine derivatives to date [15] and was calculated
415
to be the lowest energy radical formed by net hydrogen atom removal from thymine. Experimentally [17], this radical is characterized in thymine crystals by two methyl hydrogen isotropic HFCCs (-15.7 and -16.4 G) and a small C6H isotropic coupling (-10.7 G). The corresponding theoretical isotropic couplings are - 14.1, - 15.1 and - 11.4 G, respectively. In addition, the anisotropic HFCCs agree closely for all three allylic protons. An additional weak coupling (Ai~o = -1.0 G) was assigned to N3H and an estimated spin density of 0.04 was assigned to N3. The calculated spin densities indicate a smaller amount of spin on N3 (-0.01) and a greater amount on N1 (0.08). Although the calculated isotropic couplings are small in both cases, it can be suggested that the experimental coupling is due to the hydrogen at N1 (-2.5 G) rather than at N3 (0.1 G). The experimentally derived anisotropic HFCCs fall in-between those calculated for N1H and N3H, and thus assignment to either of these atoms is not facilitated through examination of the anisotropic HFCCs. ,.
The large anisotropic couplings (-11.7, 4.7, 6.9 G) assigned experimentally to the N6-dehydrogenated adenine radical [A(N6tt)] in co-crystals of adenosine and 5-bromouracil (rA:5BrU) were speculated to arise due to hydrogen-bonding interactions in the crystal where the remaining N6 hydrogen is hydrogen bonded to oxygen in uracil [18]. The calculated geometry of the N6-dehydrogenated radical is planar with the remaining amino hydrogen also located in the molecular plane. The calculations indicate that the N6-dehydrogenated radical indeed possesses a large isotropic coupling (-11.8 G) with significant anisotropy (-9.7, -2.0, 11.8 G). Differences from the experimental anisotropic results isolated in rA:5BrU may arise due to hydrogen bonding in the crystal structures. Crystal effects, such as hydrogen bonding and crystal packing, must play an important role since even the experimentally determined anisotropic HFCCs obtained from rA:5BrU (-11.7, 4.7, 6.9 G) and either adenosine (rA) [19] (-9.1, 1.2, 7.9 G) or anhydrous deoxyadenosine (dA) [20] (-8.3, -1.2, 9.4 G) differ substantially. It is clear from the present calculations (Table 1) that the magnitude of the N6H coupling tensor is significant without hydrogen-bonding effects. Overall, it can be concluded that the calculated results support the experimental assignment of this radical due to the magnitude of the calculated N6H anisotropic HFCCs and C8H data. The radical formed through net hydrogen addition to C5 in guanine [G(C5H)] was identified in detailed work on 2'-deoxyguanosine 5'-monophosphate (5'dGMP) [21]. The experimental study indicated that C5H has a very large isotropic coupling (54.0 G) and a very small anisotropic coupling tensor (-1.0, -0.7, 1.7 G). The C5-hydrogenated radical was calculated to be in a "butterfly" conformation (Figure 3) where the pyrimidine and imidazole tings remain planar
416
Figure 3" "Butterfly" conformation of G(CSI-I). but are tilted about the C4C5 double bond towards each other [22]. A higher energy conformer (not examined in the present work) involves the tings tilted to opposite sides of the C4C5 bond [22]. The experimental anisotropic coupling tensor is in good agreement with the calculated tensor (-0.7, -0.5, 1.2 G). The calculated isotropic C5H coupling (49.5 G) also supports the experimental assignment of the observed spectrum to G(CSH) and verifies that C5H is located perpendicular to the C4C5 bond in the "butterfly" conformation. The only radical identified in nonprotonated guanine crystals formed through net hydrogen atom abstraction is the N2-dehydrogenated radical [G(N2I-I)]. This radical has been observed in 5'dGMP [21,23] and guanosine 3'5'-cyclic monophosphate (3'5'cGMP) [24], and all experimental couplings are in excellent agreement with one another. The C8 and N2 spin density distributions in all samples were determined to be approximately 0.17 and 0.33 (calculated values: 0.19 and 0.37, respectively). The N3 spin density (0.31) was determined in the study of 3'5'cGMP crystals (calculated value" 0.35) [24]. The experimental couplings for the N2-dehydrogenated radical obtained in the various studies are remarkably similar. The C8H coupling tensor consists of an average isotropic component of-4.9 G and an average anisotropic component of (-2.5, -0.2, 2.7 G), which are only in fair agreement with the calculated values (Ai~o = -6.0 G; Tii = -3.4, -0.3, 3.7 G). The remaining amino hydrogen was also observed in the experimental studies, where an isotropic HFCC, averaged between the three studies, of-9.6 G was obtained. The magnitude of this coupling is again larger than the N2H coupling obtained from DFF (-7.6 G). However, comparison of experimental (-6.9, -1.0, 7.8 G) and calculated (-6.6,-1.2, 7.7 G) anisotropic N2H coupling tensors supports the experimental assignment of the spectrum to
G(N2H). These five examples illustrate the agreement between theory and experiment that is considered to be more than sufficient for the calculations to verify the experimental radical assignment. It is important to stress once again that the anisotropic coupling can be calculated to a greater degree of accuracy and that some error is expected in the isotropic component. This trend is nicely
417
portrayed in Table 1. It is also interesting to note that for larger (adenine and guanine) radicals the agreement with experiment is not as good as observed for smaller (thymine) radicals. This can be attributed to the fact that it is more difficult to describe the larger systems theoretically. However, all examples discussed above show that the level of theory chosen is suitable to evaluate HFCCs in DNA radicals. In the next section, a selection of radicals will be discussed where the agreement between theory and experiment is initially poor until the differences between the two data sources (gas-phase versus crystalline environment with extensive hydrogen bonding and possible crystal packing effects) are taken into account. 2.2 Pyrimidine and purine radiation products: problematic cases For most of the thymine radicals considered, excellent agreement between theory and experiment was observed as shown for T(C5H) and T(CH2) (Section 2.1). Thus, it was surprising that poor results were observed for the O4-hydrogenated product [T(O4H)]. However, comparison of the experimental and calculated HFCCs in T(O4H) indicates that good agreement between the two data sets is obtained for all of the HFCCs except for the isotropic O4H coupling (Table 2). The spin density in this molecule was concluded from experimental data to exist predominantly on C6 (0.50) and C4 (0.40), with a small amount on C5 (0.08). This is in good agreement with calculated results
Table 2" Theoretical and experimental HFCCs (G) in radicals which exhibit poor a~reement between the two data sets. Theory Experimenta Radical Atom A~o Txx Try Tzz A~o Txx Try Tzz T(O4H) N3H -3.4 -2.9 -1.0 3.9 -2.1 -2.5 -0.7 3.1 O4H -1.6 -1.7 -1.6 3.3 1 2 . 3 -2.6 -2.5 5.1 C6H -15.1 -8.5 -0.4 8.1 -14.2 -8.2 1.0 7.2 C5-CH -4.0 -0.7 -0.4 1.3 -2.6 -0.6 -0.4 1.1 A(N3H)
C2H N3H C8H
A(N3H+)
C2H
-12.9 -7.3 0.4 15.2 -2.9 -1.5 -3.0 -1.9 -0.2
6.9 -10.6 -5.9 0.2 5.7 4.4 - 3 . 9 - 3 . 1 -0.9 4.0 2.1 -4.4 -2.4 0.3 2.1
2.2 -9.4 -0.9 10.3 -14.2 -10.0
1.0 9.0
G(O61-1+) NIH -3.2 -2.4 -0.5 2.9 O6H 22.0 -1.5 -1.2 2.7 N7H -2.4 -1.5 -1.2 2.7 - 2 . 8 -1.6 -0.9 2.4 C8H -11.0 -6.2 0.3 5.9 -8.1 -4.0 -0.6 4.5 aReferences for experimental data: T(O4I-I) reference 17; A(N3H) reference 18; A(N31-1+)reference 26; G(O6H+) reference 30. ,i
418
obtained from a Mulliken population analysis (0.56, 0.36 and-0.12 on C6, C4 and C5, respectively) indicating that an accurate description of the spin distribution in this radical is obtained with the level of theory implemented. The question remains as to why the isotropic O4H couplings do not correspond. Experimentally, the relatively large coupling (12.3 G) assigned to O4H was speculated to be due to an out-of-plane position for this atom. Semi-empirical calculations performed by Sagstuen et al. [17] support the initial predictions of an out-of-plane hydrogen configuration. B3LYP predicts O4H to be in the molecular plane (structure I, Figure 4), a configuration which results in a very small HFCC (-1.6 G). Effects of an out-of-plane position on the O4H HFCCs were investigated through single-point calculations performed by fixing the ring geometry, as this is expected not to change considerably, and by varying the HO4C4C5 dihedral angle (0) in steps of ten degrees out of the molecular plane. These single-point calculations (Table 3, left columns) indicate that the isotropic O4H HFCC is very dependent on the dihedral angle and a maximum HFCC (= 22 G) is obtained at an angle of 90 ~ out of the molecular plane. The rotational barrier is very small, approximately a 2 kcal/mol difference between the in-plane position and the position 90 ~ out of the plane, and a 5 kcal/mol difference when the hydrogen is cis relative to the C4N3 bond. The rotation does not modify the spin distribution in the radical. A calculated O4H coupling close to the experimental value is observed at an angle of approximately 50 ~ out of the molecular plane. H
I 9"
H
II I
H
I
II
I "-
I
H
II
Figure 4: Conformations of T(O4tl). The difference between theory and experiment for the geometry of T(O4H) arises due to the rapid rotation of the methyl group in the experimental environment, which is characterized by the presence of three equivalent methyl group protons in the ENDOR spectra. Allowing for the rotation of the methyl group, the O4-hydrogen and the in-plane methyl hydrogen positions are only
419
Table 3" I-IFCCs (G) and relative energies (kcal/mol) obtainedin T.(O41t) through examination of methyl group rotation. .. Dihedral Methyl group optimized Methyl group rotated Angle Aiso(O4H)Rel. Energies Aiso(O4H) Rel. Energies 0 -1.6 0.0 -1.7 1.6 20 0.7 0.2 0.6 1.3 40 7.2 0.6 7.2 1.0 60 15.1 1.2 15.6 0.6 80 21.6 1.7 21.7 0.3 100 23.4 2.0 22.8 0.1 120 19.2 2.5 18.3 0.0 140 11.0 3.3 10.1 0.4 160 2.3 4.6 1.9 1.1 180 -1.7 5.2 -1.7 1.5 ,
i.
separated by 1.62 ,~ in the calculated geometry (structure II, Figure 4). The effects of this unfavorable interaction [17] and the unfavorable interaction with the N3-hydrogen are expected to result in an out-of-plane position for O4H in the crystals. This hypothesis is readily confirmed by additional single-point calculations performed by rotating the HO4C4C5 dihedral angle as before, but with the methyl group fixed in a staggered orientation with respect to the C5C6 double bond (Table 3, fight columns). In this case, the lowest energy orientation for the O4H is at an angle of approximately 50-60 ~ out of the molecular plane (0 = 120-130~ the same position that yields the experimentally determined O4H HFCC. The results obtained for the thymine O4-hydrogenated radical can be extended to 1-methylthymine and deoxythymidine since geometrical and electronic changes are expected to be small upon substitution at the N1 position. Comparison of calculated and experimental HFCCs indicates that the O4-hydrogen remains in the molecular plane and at an angle of approximately 60 ~ out of the molecular plane in 1-methylthymine [25] and deoxythymidine [15,25] crystals, respectively. The differences in these systems relative to unsubstituted thymine arise due to the characteristic hydrogen bonding patterns in the crystals.
Figure 5" Calculated distortion in the gas-phase A(N3I-I) radical.
420
The radical formed by net hydrogen addition to N3 in adenine [A(N3H)] undergoes geometrical alterations upon formation. The N3 hydrogen is located out of the molecular plane and the amino group is puckered with both hydrogens displaced out of the plane (Figure 5). The experimental and theoretical C2H and C8H isotropic HFCCs, as well as the anisotropic tensors, are in good agreement with the calculated results for this radical (Table 2). However, a small N3H coupling has been experimentally observed for this radical (-3.9 G) while a large HFCC (15.2 G) was calculated due to distortions at N3. It is possible that hydrogen bonding or packing effects in the crystal forces the N3 hydrogen to remain in the molecular plane, thus leading to a small isotropic HFCC and explaining why the N3H coupling is not observed in all experimental studies. This hypothesis can be tested through examination of a fully optimized C~ structure, which lies only 1.7 kcal/mol above the non-planar arrangement and possesses two imaginary frequencies. The spin distribution, and the C2H and C8H HFCCs (Table 4), in the planar radical is very similar to that calculated for its puckered form (Table 2). The main difference in the computed couplings is in the magnitude of the N3H isotropic HFCC. In the C~ N3-hydrogenated radical, the N3H isotropic component was calculated to be -3.6 G, which is in much better agreement with experiment (-3.9 G) than that calculated for the puckered form (15.2 G). Hence, it can be concluded that in crystals where the N3H coupling was detected, A(N3H) is likely to remain in a planar form. The N l-protonated form of A(N3H) [A(N3H+)] has been observed in crystals of adenine hydrochloride hemihydrate (A:HCI:VzH20) [26]. The HFCCs in this protonated radical follow a similar pattern to those discussed for A(N3H). The Table 4: Comparison between experimental HFCCs (G) and those calculated for planar radicals. Theory Experimenta _Radical Atom Aiso Txx Trr Tzz Ai~o Txx Trr A(N3H) C2H -14.0 -7.9 0.2 7.7 -10.6 -5.9 0.2 N3H -3.6 - 3 . 3 -1.1 4.5 - 3 . 9 -3.1 -0.9 C8H -5.1 -2.9 0.2 2.7 -4.4 -2.4 0.3 i ,
A(N3H+)
C2H
-18.5 -10.7
0.4
10.4 -14.2 -10.0
T~ 5.7 4.0 2.1
1.0 9.0
G(O6H+) N1H -2.6 -2.2 -0.9 3.1 -3.2 -2.4 -0.5 2.9 O6H -1.5 -1.5 -1.5 2.9 N7H -3.2 - 2 . 3 -1.3 3.6 - 2 . 8 -1.6 -0.9 2.4 C8H -12.8 -7.2 0.4 6.8 -8.1 -4.0 -0.6 4.5 ~References for experimental data: A(N3H) reference 18; A(N3H+) reference 26; G(O6H+) reference 30.
-
9
ii,,
421
gas-phase geometry of A(N3H+) is distorted at C2, which leads to N3H HFCCs in poor agreement with experiment (Table 2). However, upon consideration of a planar structure, the HFCCs are in good agreement with the experimental results (Table 4). These examples illustrate how hydrogen bonding and/or crystal packing can affect the radical geometry and therefore, indirectly, the HFCCs. The N7-protonated O6-hydrogenated guanine radical [G(O6H+)] has been observed in studies on crystals of guanine hydrochloride monohydrate (G:HCI:H20) [27], the free acid of guanosine 5'-monophosphate (5'GMP(FA)), [28] guanine hydrochloride dihydrate (G:HCI:2H20) [29] and guanine hydrobromide monohydrate (G:HBr:H20) [30]. The geometry was calculated to exhibit distortions at C6 (Figure 6), where O6H is located out of the molecular plane and resuks in a large isotropic O6H coupling which was not recorded experimentally. The calculated coupling for the hydrogen at C8 is also large, while the corresponding experimental coupling is small (Table 2). Not even the anisotropic couplings for this radical are in agreement. Thus, it seems unlikely that the N7-protonated 06 hydrogen addition radical is responsible for the spectra observed in these studies. Since hydrogen-bonding interactions or crystal packing effects may resuk in a planar geometry, as discussed for A(N3I-I) and A(N3H+), a Cs radical was obtained through a full optimization, which possesses one imaginary frequency and lies 1.7 kcal/mol higher in energy than the nonplanar radical. Calculations on the planar species (Table 4) yield a small O6H coupling which is expected experimentally and, thus, the agreement between the calculated couplings and experiment could be considered to be improved over that observed for the nonplanar radical. Additionally, a N1H coupling was calculated in the planar radical that was not obtained for the nonplanar form. However, the experimental and calculated couplings disagree in the magnitude of the CSH coupling, where the HFCCs obtained from the calculations are too large relative to those obtained in the experimental study. The possibility that the observed radical is not protonated can be eliminated. In particular, the CSH HFCC for the planar O6-hydrogenated radical (Ai~o = -3.9 G;
Figure 6: Calculated distortion in the gas-phase G(O6I-I+)radical.
422
Z/i "-" -2.3, -0.1, 2.3 G) is different from that assigned to the N7-protonated O6-hydrogenated radical. Furthermore, clear couplings were observed experimentally for N7H. To ensure that differences in the calculated and experimental C8H couplings for the N7-protonated O6-hydrogenated radical do not arise due to differences in the hydrogen bonding environment at N7, a series of calculations were performed where the N7H bond was lengthened [3]. The N1H, O6H and C8H couplings did not change over the N7H bond lengths investigated (0.908 - 1.308 ,a,). Alternatively, the N7H anisotropic couplings show a decrease in magnitude with an increase in bond length. Despite the great difference between the C8H couplings in the planar protonated and nonprotonated radicals, neither of these couplings match those assigned to the protonated O6-hydrogenated radical. However, the average of these couplings ( A i s o -~ -8.4 G; Z i i - -4.8, 0.3, 4.6 G) is in good agreement with the experimental results (Ai~o = -8.1 G; Zii- -4.0, '0.6, 4.5 G). Moreover, the average calculated N1H coupling (A~o = -2.8 G; T i i -" -2.4, -1.1, 3.5 G) is also in agreement with experiment (Ai~o = -3.2 G; T i i - -2.4,-0.5, 2.9 G). Any discrepancies between experimental and calculated N7H couplings can also be explained in terms of differences in the N7H bond length. The experimental N7H HFCCs are in better agreement with the calculations performed at longer bond lengths than those performed at the optimized geometry [3]. Thus, a possible explanation for the observed spectra is either a recorded averaging through temperature effects, which cause N7H to vibrate or rotate, or an extreme example of the effects of hydrogen bonding on the HFCCs. In either case, this example illustrates how hydrogen bonding in the crystals can directly affect HFCCs. Additionally, this discussion demonstrates how calculations can be used to view experimental spectra in a new fight.
Another truly interesting problem arising in the calculations under discussion is the inaccurate description of the puckering at a carbon center upon hydrogen atom addition. Table 5 compares theoretical and experimental isotropic HFCCs for the hydrogens located at the carbon to which the additional hydrogen has been added in all radicals generated through net hydrogen atom addition to a double bond in the four DNA bases. Most of the calculated couplings for the two hydrogens are nearly identical. Indistinguishable couplings arise since the radicals have been calculated to be planar with the two hydrogens under discussion lying on either side of the molecular plane. Accurate experimental resuks obtained with ENDOR, however, recover unique couplings for each hydrogen, which presumably indicates differences in the atomic environment due to puckering at the addition site. Only the results for T(C5H) are in good agreement with experiment (recall that it is difficuk to reduce the error in computed isotropic HFCCs to less than about 20%). This illustrates that when a
423
Table 5: Comparison of theoretical and experimental isotropic HFCCs (G) in radicals formed by net hydrogen atom addition to a double bond in the DNA bases.~ Radical Experiment b Theory Radical Experimentb Theory T(C5It) 48.6 41.9 A(C2tt) 32.8/38.9 43.3 54.3/47.5 45.5 T(C6tt)
45.3 32.0
33.9 33.9
A(C8tt)
36.3/36.7/38.4 41.6/40.9/41.0
38.9 39.1
C(C5tt)
47.1 31.0
44.6 14.0
A(C2H+)
39.1 40.5
36.2 36.7
C(C6H)
51.3 47.7
45.1 42.0
A(C8H+)
40.9 43.0
40.5 40.6
29.1 37.2 36.9 G(C8H+) 33.1/35.3 29.1 39.3 37.2 36.5/38.2 aAll data presented is for the hydrogens at the addition site. ~ full list of references to the experimental data can be found in references 1 - 4. G(CSH)
bulky methyl group is attached to the hydrogen addition site, the puckering is much easier to describe theoretically (the geometry of this radical is displayed in Figure 2). Another radical for which adequate calculated HFCCs were obtained is C(C61-I). Although the theoretical results for this radical are smaller than the experimental values, the difference between the two hydrogen couplings (3 G) is identical in both data sets. For all other radicals in Table 5, there exists a significant disagreement with experiment, which is mainly due to inadequacies in describing ring puckering. The fact that B3LYP predicts relatively planar geometries compared to other theoretical methods has been documented in the literature for T(C5H) and T(C6H) [31]. Although the main shortfall for the calculated geometries of the radicals in Table 5 is the predicted planar geometry versus the apparently puckered structures experimentally, the puckering in C(C5H) was overestimated. The difference between the experimental C5H couplings is 16 G whereas the difference between the calculated couplings is much larger (30 G). If a planar radical is assumed, then equivalent couplings are obtained theoretically (35.4 and 35.3 G). Thus, it can be concluded that the geometrical distortion in the crystalline environment must lead to a nonplanar radical with less puckering than initially calculated. For all radicals formed through net hydrogen atom addition to a double bond in one of the DNA bases (Table 5), the calculated anisotropic
424
couplings for the hydrogens at the addition site and/or the couplings for other hydrogens confirm the experimental assignment of each radical. Thus, from these examples it is evident that even though the experimental HFCCs were extractedat low temperatures, often gas phase calculations are not capable of reproducing the experimental results. Thus, alternate effects must be taken into account. The most common arguments implemented to understand why theoretical and experimental HFCCs differ include molecular vibration (the rotation of the methyl group in T(O4H)) and the hydrogen bonding scheme and packing effects in the crystal, which can either induce geometrical effects (planar radicals versus gas phase puckered geometries as considered for A(N3H) and A(N3H+)) or affect the HFCCs more directly (through hydrogen bonding to neighboring sites as discussed for G(O6H+)).
2.3 New mechanism for radiation damage in cytosine monohydrate The radicals discussed in Sections 2.1 and 2.2 display good agreement between theory and experiment initially or after alternate arguments had been employed to understand or verify experimental results in relation to calculated HFCCs. The results clearly indicate that the level of theory chosen to calculate the HFCCs is adequate and can be applied to a wide range of DNA radicals (both protonated and nonprotonated). In the present section, the radicals generated in cytosine derivatives will be discussed. The most complete experimental study has been performed on cytosine monohydrate (Cm) crystals by Sagstuen et al. [32]. The suggested mechanism for radical formation in cytosine monohydrate involves net hydrogen atom removal from the N1 position of one cytosine [C(N1)] and hydrogen atom addition to the N3 position of a neighboring cytosine [C(N3H)]. The experimental and theoretical HFCCs in the two radicals believed to be the main products of radiation damage to Cm crystals will now be presented in detail. Table 6: Comparison of theoretical and experimental HFCCs (G) for the first major radical product assigned experimentallyin irradiated cytosine monohydrate crystals [C(N3H)]. Theory: Nonplanar Theory: Planar Experimenta Atom A/so Txx Tre Tzz A/so Txx Trr Tzz A/so Txx Trr Tzz N3H 0.6 -2.5 -1.0 3.5 - 2 . 9 -2.6 -0.9 3.5 -2.0 -2.1 -0.5 2.6 N4H 19.6 -1.0 -1.0 2.0 - 2 . 7 -2.3 -0.7 3.0 - 1 . 6 -1.7 -0.8 2.4 N4H -1.1 -1.5 -0.5 2.0 - 2 . 4 -2.0 -1.0 3.0 C6H -13.7 -8.3 0.2 8.2 -14.8 -8.9 0.5 8.5 -13.5 -8.8 0.8 8.0 aReference 32.
425
C(N3H) is the lowest energy radical formed by net hydrogen atom addition to cytosine. The calculated C6H and one N4H HFCCs in this radical are in very good agreement with those obtained experimentally in cytosine monohydrate (Table 6). On the contrary, the N3H coupling was calculated to be smaller than that determined experimentally, while a large coupling (19.6 G) was obtained from the calculations for the second amino hydrogen. Differences in the experimental and calculated couplings of C(N3H) could arise due to a rotation about the C4N4 bond i n the optimized gas-phase geometry relative to that present experimentally, where hydrogen-bonding effects may be important as discussed for radicals generated in the other nucleobases (such as A(N3H) and G(O6H+)). More specifically, due to crystal interactions a planar radical may predominate over one with a distorted amino group. This is confirmed through the optimization of a radical constrained to C~ symmetry, which is only 3.6 kcal/mol higher in energy than the nonplanar form. The two small isotropic N4H, the anisotropic C6H and the isotropic N3H couplings obtained for the planar radical are in much better agreement with experiment than those discussed for the nonplanar form. Thus the calculations confirm the experimental assignment of one of the major radical products in Cm crystals once hydrogen bonding or crystal packing effects are taken into account. C(N1) is the lowest lying radical formed through net hydrogen atom removal. The calculated spin density displays an alternating pattern with the main components situated on C5 (0.49), O2 (0.35) and N1 (0.29). This distribution is quite different from that obtained experimentally (0.57 and 0.17 at C5 and N4, respectively). In addition, the calculated and experimental HFCCs deviate substantially (Table 7), where even the calculated anisotropic couplings for the amino hydrogens are extremely small compared to experimental values. Since it is known that the anisotropic component can be calculated with a great degree of accuracy using many theoretical techniques, the deviations observed for this radical are too large to be ascribed to the method employed. One possible explanation for deviations from experimental couplings could be that a rotation occurs about the C4N4 bond in the experimental environment which could lead Table 7: Comparison of theoretical and experimental HFCCs (G) for the second major radical product assigned experimentally in irradiated cytosine m0nohydrate crystals [C(N1)]............. Theory Experimenta Atom Aiso Txx Try Tzz Aiso Txx Try Tzz N4 -0.7 -0.5 -0.4 0.9 -5.1 -3.3 -0.6 4.0 N4H -0.5 -0.7 -0.4 1.1 - 4 . 6 -2.2 -1.3 3.5 C5H -11.2 -6.9 -0.4 7.2 -14.8 -7.5 -0.3 7.8 aReference 32. _
426
to significant N4H couplings compared to those calculated for the nearly planar structure. Variation in the HFCCs with rotation about the C4N4 bond was examined and the agreement between the experimental and theoretical HFCCs was not improved [2]. Calculations of the couplings of the N 1-dehydrogenated radical surrounded with up to four water molecules or additional neighboring cytosine fragments to simulate the experimental hydrogen-bonding scheme could also not reproduce the experimental couplings [2]. Even a cytosine dimer was studied to model the N 1-dehydrogenated, N3-hydrogenated diradical pair. None of these investigations lead to a clear theoretical description of the experimental results [2]. Thus, since good results were observed for so many other related DNA radicals, alternate radicals must be considered as possible precursors to the observed HFCCs. Among all cytosine radicals considered, the only radical which gave couplings similar to those assigned experimentally to C(N1) is the radical formed via net hydroxyl radical addition to C5. Two conformers were optimized for this radical [C(C5OH-1) and C(C501-I-2)] and the couplings vary slightly (Table 8). Among the entire set of computed couplings for any cytosine radical, the N1H couplings obtained in each conformation of the C5-hydroxylated product (Table 8) are in best agreement with the experimental couplings assigned to the amino hydrogens in C(N1) (Table 7). One large, negative isotropic coupling, obtained for C6H in these radicals, is not unlike that assigned to C5H in C(N1), although the anisotropic results deviate more substantially. In addition, a C6H coupling left unassigned to a specific radical in cytosine monohydrate (Aiso = -18.2 G; Tii = -9.6, 0.9, 8.6 G) resembles those calculated for C6H in the C5-hydroxylated radicals. The large isotropic coupling (33.0 or 37.4 G) calculated for C5H in the C(C5OH) radicals could be used as a fingerprint for the identification of this radical in future studies. Alternatively, this coupling may have gone undetected in the experiments due to its similarity to the coupling assigned to the C5-hydrogenated radical. Table 8" Theoretical HFCCs (G) calculated for the newly assigned major radical product generated in irradiated cytosine monoh~,drate cr),stals [C(C501-I)]. C(CSOH-1)
Atom Ai~o N1H -4.2 C5H 33.0 C6H -10.6
Txx
C(CSOH-2)
Tzz 4.9 -1.6 -0.5 2.1 3 7 . 4 -1.5 -0.8 2.3 -9.6 -0.3 9.9 -13.3 -10.2 -0.3 10.6 -3.5
Trr Tzz
-1.7
5.3
Ai~o
Txx
Trr
-3.8
-3.2
-1.7
,
427
Thus, although theory does not unequivocally favor one mechanism or the other, comparison of experimental and theoretical HFCCs suggests that the experimentally proposed mechanism is less likely. Furthermore, at least two different mechanisms can be considered which yield the N3-hydrogenated and C5-hydroxylated products and both involve water molecules. In the first postulated mechanism, ionization and electron uptake are initially assumed to occur on cytosine to form C ~ and C ~ where water subsequently adds to the former. The second postulated reaction mechanism involves ionization of a water molecule followed by electron uptake at cytosine, resulting in a water cation and a cytosine anion, where the former dissociates to hydroxyl radicals and protons. Both of these reactions have a net energy cost of 58 kcal/mol, but the second postulated mechanism has a greater energy cost for the first step in the reaction. Ionization of cytosine, which forms C "§ and C ~ followed by deprotonation of the cation and protonation of the anion (as suggested in the experimental study of Cm crystals) costs 68 kcal/mol. Of the mechanisms discussed, the path involving cytosine ionization and water addition is most likely to occur. Reasons for this include the fact that approximately 85% of all ionization processes will occur on cytosine since it possesses a greater number of electrons relative to water. In addition, this reaction has lower energy costs for the initial step (relative to the mechanism involving ionization of water) and the overall process (relative to the proposed mechanism involving hydrogen addition and abstraction products). However, the reaction mechanism involving radiolysis of water to produce hydroxyl radicals and hydroxyl radical adducts is a commonly used ESR technique [33,34]. In addition, Sevilla and coworkers have investigated the presence of hydroxyl radicals in the DNA hydration layer [34]. Hydroxyl radicals were found in the intermediate hydration shell, but not in the closest hydration layer. This was speculated to occur due to reactions of the hydroxyl radicals with DNA. The present work indicates that this option should be examined more closely. In addition, Wala et al. [35] have reported that strand-breaks in DNA occur due to hydroxyl radical addition to the DNA bases. Reactions of DNA and hydroxyl radicals have also been reported to lead to 5-hydroxycytosine [36]. Experimental investigations of adenine and guanine monohydrate crystals also indicate that products formed through net hydroxyl radical addition may also be formed in these crystals. For example, early ESR studies on frozen aqueous solutions of deoxyadenosine 5'-monophosphate [37] revealed one isotropic coupling (29 G) which was believed to be due to a radical formed through addition of a hydroxyl radical to C8 in adenine [A(C8OH)]. The calculated results for the C8-hydroxylated radical (28.8 G) indicate that this coupling is
428
indeed due to the C8H in A(C8OH). Furthermore, the calculations show that a better resolved spectrum would yield experimental couplings for C2H, N9H and both of the amino hydrogens. The spectrum of the C4-hydroxylated guanine radical [G(C4OH)] was recorded in crystals of 3'5'cGMP [23]. The observed radical was determined to possess a C8 spin density of approximately 0.25 (calculated value: 0.26). The only coupling extracted from the experiments was for C8H, whose principal tensor is (- 10.1, -6.9, -3.1 G). These couplings agree reasonably well with the calculated Ai~o and T/i for the proposed radical (-12.3, - 8 . 5 , - 2 . 7 G). If the individual components of the coupling tensor are considered, however, then only fair agreement with experiment is obtained. The protonated radical formed by net hydroxyl radical addition to C8 in guanine [G(C8OH+)] has been observed in single crystals of G:HCI:2H20 [29]. The observed spectrum consists of a large C8H isotropic coupling (20.2 G) and a very small anisotropic tensor (-1.1, -0.6, 1.6 G), which is in excellent agreement with the calculated values (Ai~o = 17.5 G, T i i - -0.9,-0.5, 1.4 G). Experimentally, another isotropic coupling was observed for N7H (-8.5 G) which also possesses great anisotropy (-6.5, -1.5, 8.0 G). The calculated N7H couplings (A~o = -7.0 G; Ti~ = -5.9, -1.7, 7.6 G) are also in agreement with experiment. The agreement observed for the N7H couplings is impressive since the local environment (hydrogen bonding) has been shown to affect the couplings of the hydrogen at N7 in other radicals (for example, G(O6H+), Section 2.2). A small N9H isotropic coupling was also obtained experimentally (-2.2 G) and theoretically (-2.1 G). Experimentally, it was speculated that the observed spectrum could be due to G(C8H+) where the additional hydrogen is added to an in-plane position and, thus, only one large C8H coupling is observed. This alternative seems very unlikely due to the excellent agreement between experimental and calculated HFCCs for G(C8OH+). The examples of hydroxyl radical addition products identified in single crystals of adenine and guanine base derivatives confirm that water can play an important role in the radiation damage mechanism. It should be noted that the new mechanism has attracted some criticism [38]. The main objection is that nonplanar geometries were calculated for the gas phase radicals whereas planar structures are expected experimentally due to the hydrogen-bonding scheme [38]. As shown within, accounting for crystal interactions can lead to improved results in most cases [C(N3I-I)], however, not for C(N1). Thus, the newly proposed mechanism should not be discarded based solely on these arguments [39].
429
Assigning the N3-hydrogenated and C5-hydroxylated radicals as the major radiation products in cytosine monohydrate crystals would also explain the absence of the Cm couplings assigned to the N 1-dehydrogenated radical in the larger cytosine systems. Previously it was assumed that these couplings were not observed since a methyl or sugar group replaces the hydrogen at N1 preventing the N 1-dehydrogenated radical from forming. A new explanation uses the fact that water was not present in previous crystal studies and, thus, the C5-hydroxylated product was not possible. Monohydrate crystals of deoxycytidine 5'-monophosphate (5'dCMP) [40] were studied, however, and the similarity of the couplings observed in these crystals (assigned to the cation) to those experimentally assigned to the N 1-dehydrogenated radical in Cm should be noted. All of the evidence presented above, in addition to the good agreement observed for countless other related DNA radicals, creates a clear picture that water plays an important role in monohydrate crystals and therefore will most likely play an important role in the radiation damage mechanism in biological systems.
2.4 Sugar radicals in irradiated DNA As mentioned, sugar radicals have been investigated at the same level of theory discussed for the other DNA bases. Sugar radicals can be formed through direct mechanisms, in which alkoxyl or base radicals are generated and radical character is transferred to the sugar group, and indirect mechanisms, where hydrogen or hydroxyl radicals generated from water radiolysis attack the sugar group. In an important study, Schuchmann and von Sonntag [41] concluded that hydroxyl radicals attack the six carbon atoms in D-glucose to an equal extent. However, ESR techniques have been unable to detect sugar radicals in irradiated DNA [42]. Hole et al. [21] were the first to observe a large variety of sugar radicals in their ENDOR study of 2'-deoxyguanosine 5'-monophosphate, where nine sugar radicals were characterized. This provides a nice example of the power of the ENDOR technique since ESR did not easily detect these radicals. A subsequent ENDOR study of single crystals of deoxyadenosine [43] supported the hypothesis that many sugar radicals are generated upon irradiation. Theoretical investigations of carbon-centered sugar radicals have appeared in the literature [44,45]. In these studies, geometries, relative energies, spin density distributions and hyperfine coupling constants were calculated at the Hartree-Fock level. Both studies were very complete and carefully performed at the level of theory chosen. However, Hartree-Fock overestimates the hyperfine coupling constants considerably and methods accounting for electron correlation are essential to calculate this property accurately.
430
The model sugar group chosen (Figure 7) represents phosphate groups with hydroxyl groups and the DNA base with an amino group. The sugar radicals investigated include hydrogen abstraction radicals formed by removal of hydrogen from all carbon and oxygen atoms, radicals formed via removal of either of the hydroxyl groups in the model system, as well as a variety of radicals which lead to significant sugar ring alterations. Some of the results obtained for these systems will be presented in the present section. Two different puckering modes were examined for each possible radical corresponding to north (N) and south (S) radicals, which are defined according to where the radical is located on the pseudorotation cycle [46]. The nonradical sugars present in A and B-DNA are in north and south conformations, respectively. B3LYP predicts C4'~ and C2'~ radicals to be the lowest and highest energy radicals among those formed by hydrogen abstraction from a carbon, respectively. This information is useful to determine which sugar radicals are most easily generated and, thus, which radicals are involved in strand-breaks, as mechanisms have been proposed involving almost all carbon centers. The C2' radicals were determined to be relatively flat at the radical center since oxygen is not present at a neighboring site which removes unfavorable interactions with lone pairs. This lack of stabilization helps to explain why these were calculated to be the highest energy radicals in this class and why C2' radicals have only appeared as minor products in experimental studies [21]. For all hydrogen and hydroxyl abstraction radicals, the major geometrical alterations that occur upon radical formation affect only the bonds and angles involving the radical center. The bonds between the radical center and surrounding atoms are generally contracted between 0.04 and 0.07/k. The
O_ ~ 5' r OzPO~ ~ ~ a s e Nucleotide 4'~,H "~ H/~ 1' Unit 3H ' -"~ 2' ~' H
5' QII HOCH_ - L ~ o ."_. . . . jNH2 4N3~,iN ~ ~HI' ~oHZH oe
po~ I
II
Figure 7: Structure and numbering of the sugar group present in DNA (I) and the model system used for the calculations presented within (II).
431
bond angle with the radical center as the central atom changes between 2 and 8 ~ The remainder of the sugar ring geometry in all radicals is relatively unaffected. The couplings present in the spectra of a number of irradiated DNA molecules have been assigned to the radical formed via net hydrogen atom removal from C 1 (CI'~ [21,26,43,47]. Hole et al. [21] determined that the n-spin density at CI' in this radical is 0.64, which is smaller than the calculated value (0.75). Comparison of experimental and theoretical HFCCs indicates that the calculations support the experimental assignment of the C I ' radical (Table 9). In particular, the experimental results agree more closely with those calculated for the N-type radical. One of the C2'H couplings calculated for C I " ( S ) is significantly smaller (9.1 G) than the experimental results (approximately 18 G). This is a nice example of the effects of the sugar ring puckering on the HFCCs. It should be noted that although the C2'H isotropic components differ between N and S-type radicals, the anisotropic values are almost identical. Another example of the predictive power of theoretical calculations can be found in C3 '~ Through comparison of the two calculated C2'H couplings in the N and S-type conformers with experimental results obtained from 2'-deoxyguanosine 5'-monophosphate (Table 9) [21], the nature of the observed radical is difficult to predict. However, the calculated HFCCs for the N and S-type C3' radicals differ through the absence of a C4'H coupling in the latter conformation. Since a large C4'H coupling was recorded experimentally, ' the~ calculations predict this radical to be present in the north conformation. The Table 9: Comparison of theoretical and experimental HFCCs (G) for select sugar radicals formed via net hydrogen atomremoval from one of the ring carbons. Theory: North Theory: South Experiment a Radical Atom Aiso Txx Tr~ Tzz Aiso Txx Trr Tzz Also Txx Trr Tzz C1 '~ C2'H 18.5 -1.4 -1.0 2.4 9.1 -1.9 -1.4 3.3 17.2 -1.9 -1.7 3.6 C2'H 22.7 -1.9 -1.6 3.4 29.3 -1.4 -1.1 2.5 25.4 -1.7 -0.7 2.3 C3 '~
C4 '~
C2'H 18.9 -1.7 -1.5 3.2 12.4 -1.6 -1.1 2.7 16.7 -1.6 -0.9 2.5 C2'H 34.0 -1.5 -1.1 2.6 31.2 -1.9 -1.5 3.5 38.1 -2.1 -1.4 3.4 O3'H -2.8 -4.3 -3.1 7.4 -2.3 -4.5 -3.3 7.7 C4'H 22.5 -1.6 -1.3 2.8 27.5 -1.7 -0.9 2.7
C5'H 27.9 -1.8 -1.1 2.8 C5'H 2.8 -2.1 -1.4 3.5 6.2 -2.4 -1.1 3.5 C4'H 22.1 -1.8 -1.1 2.8 31.4 -1.9 -1.2 3.2 O5'H 5.6 -1.6 -0.8 2.5 aReferences for experimental data: C1 '~ reference 43; C3 '~ reference 21.
432
calculated values indicate that O3'H has a small isotropic coupling and a relatively large anisotropic contribution ~ were not reported in the experimental study. However, experimentally there was another coupling observed for which only the principal components (16, -22, 29 G) were resolved and assignment to a particular atom was not made. The unassigned couplings are not unlike those of a C2' hydrogen and could possibly be due to a C2'H in a ring with another conformation. Any difference between the experimental and the calculated isotropic hyperfine coupling constants in this radical could be due to the presence of a phosphate group at the C5' position in the experimental study since it has been previously determined that the phosphate groups affect the HFCCs in the C3' radical [45]. Not all experimental and theoretical HFCCs for the sugar radicals are in such nice agreement with one another. A typical example is the C4" radical, which has been observed in three different crystals: uridine 5'-monophosphate (5'rUMP) [48], inosine (rI, which can be derived from adenosine by replacing the amino group at C6 with a hydroxyl group) [49] and adenosine:5-bromouracil (rA:5BrU) [18]. The C4'~ calculated radical exhibits two C5'H couplings, one of substantial magnitude (27.9 G), and no O5'H coupling, while C4'~ has a significant O5'H coupling (5.6 G) and only one small C5'H coupling (6.2 G). Experimentally, three substantial couplings of 36, 25 and 24 G were recovered in crystals of 5'rUMP, [48] and two small couplings were observed at certain orientations for which accurate HFCCs could not be evaluated. In rI [49], large C3'H (34.7 G) and C5'H (33.4 G), as well as a small C5'H (3.4 G), couplings were obtained. In rA:5BrU, two couplings were resolved corresponding to the C3' and C5' hydrogens (21.0 and 10.0 G, respectively). Overall poor agreement between theoretical (Table 9) and experimental HFCCs (mentioned within) was observed. Additionally, no anisotropic components, which are important for comparison to theoretical work, were isolated in the experiments. Differences between theoretical and experimental isotropic couplings could arise due to alterations experienced when phosphate groups replace the hydroxyl model group [45]. However, more accurate experimental and theoretical data is required to verify this radical assignment. Another example of disagreement between theory and experiment is the C5'" radical, which has been assigned in studies of various DNA constituents [21,43,47,50]. The HFCCs calculated for both the north and south conformers are in close agreement with one another as the radical center is outside the sugar ring, the part of the molecule involved in the puckering. Theoretically, relatively small isotropic couplings were obtained for C5'H and O5'H, while a large coupling was calculated for C4'H (Table 10). Experimentally, large
433
Table 10" Comparison of experimental and theoretical HFCCs (G) for C5 '~ Source Atom Also Txx TrY Tzz
5'dGMP
dAb dam c 3'CMPd
5C1 and 5BrdUd
a
Experiment C5'H -22.2 -8.7 0.8 7.9 C5'H -20.9 -8.6 0.6 8.0 C5'H -20.8 -8.8 0.8 8.0 C5'H - 19.6 -8.7 0.5 8.2 C4'H 2.5 O5'H* (16.3) (20.2) (28.1) C5'H -14.7 -7.9 -1.7 9.7 C4'H 7.0+_1 C5'H -17.5 -11.8 0.8 11.0 C5'H -22.7 -9.3 0.7 8.5 C4'H 4.5 -3.0 0.1 3.0 O5H 20.8 -4.3 -1.5 5.8 C5'H -20.7 -12.5 2.9 9.6 O5'H 8.6 -3.1 -1.0 4.2 C4'H 18.9 -1.6 -0.2 1.9
Theory C5'H -9.4 -11.2 -0.8 12.0 C4'H 33.6 -1.7 -0.8 2.5 O5'H -4.3 -5.1 -3.4 8.5 C5'~ C5'H -10.4 -10.9 -0.7 11.7 C4'H 35.3 -1.6 -0.8 2.4 O5'H -3.9 -5.0 -3.3 8.3 aReference 21. bReference 50. CReference 43. dReference 42. CS'~
couplings were elucidated for C5'H and small values for C4'H and O5'H (Table 10). Despite these differences, the anisotropic couplings are in much better agreement. However, the experimental trend stated for the isotropic couplings is not true for all of the experimental results, as even the experimental results differ greatly between crystalline environments. Due to discrepancies between the results, an in-depth investigation of the couplings assigned to C5 '~ is required. Since significant effects on the HFCCs can be observed with changes in geometry (as discussed for T(O4I-I)), an investigation of the dependence of the HFCCs on rotation about the C5'C4' bond was undertaken. The XC5'C4'C3', X = 0 5 ' or H5', dihedral angles in the north conformer were varied by increments of 15 ~ starting from the optimized geometry (289.3 ~ and 144.4 ~ for X = H5' and O5', respectively) and single-point calculations performed at each step. The results for the variation in C4'H, C5'H and O5'H HFCCs as a function of rotation angle are displayed in Figure 8. It is interesting to note that upon rigid rotation,
434
'~
r
50 /
C'4
-o-c'5
I i
40
~,
30
20
0 -10~
4~
-20 Rotation
Angle
Figure 8: The C4', C5' and 05' hydrogens' HFCCs (G) versus the rotation angle (deg.) about the C5'C4' bond for the C5'(N) radical. the isotropic component of the HFCCs changes considerably, while the anisotropic components (not shown) do not differ more than twenty percent from the values displayed in Table 10. On average, the rotation barrier about the C4'C5' bond is 8.6 kcal/mol, with maximum and minimum values occurring at 90 ~ (14.4 kcal/mol) and 15 ~ (1.4 kcal/mol), respectively. The results from the rotational study (Figure 8) shed some fight on the dependence of the HFCCs on rotation about the C5'C4' bond. The calculated C5'H isotropic HFCC does not reach the experimental value (-22 G) obtained in 5'dGMP, but comes close to the value obtained in dAm (-17 G) upon a 300 ~ rotation (-16.7 G). The variation between the O5'H and C4'H results obtained for 3'CMP, 5CldU and 5BrdU can also be understood from these results. For 3'CMP, the calculated values which satisfy both the C4' and 05' experimental couplings occur at a 130 ~ rotation, where Aiso(O5'H) = 22.6 G and Ai~o(C4'H) = 8.1 G (experimental values are 20.8 and 4.5 G, respectively). Similarly, results in agreement with 5CldU and 5BrdU experimental HFCCs occur upon a 150 ~ rotation, where Aiso(C4'H) = 17.7 G and Aiso(O5'H) = 10.3 G (experimental values: 18.9 and 8.6 G, respectively). Hence, once geometrical effects are accounted for, the calculated and experimental HFCCs agree very well. The poor agreement between theory and experiment cannot always be improved upon by the sole investigation of rotational effects. For example, a rotational
435
em
"
:OH H
"
"
m
:.OH H
ee
I
II
Figure 9: The structure of model C4' (I) and C1' (II) centered radicals formed through opening the sugar ring. study similar to that discussed for C5'" was carded out in attempts to improve theoretical results for the 0 5 ' alkoxyl radical (the results are not shown explicitly within but the reader is referred to reference 5). However, not all of the experimental results could be understood. The main explanation given for the poor agreement between theory and experiment for the 0 5 ' (and 03') centered radicals is that the hydrogen bonding and crystal environment greatly affect the HFCCs in these ~adicals and therefore calculations accounting for these effects must be performed before improved agreement can be obtained. One radical which involves damage more extensive than the sole removal of a hydrogen atom or breakage of a phosphoester bond is a C4' radical generated through breaking the C4'O1' (I, Figure 9) [21,51,52]. It is also possible that the C2'O1' bond breaks; however, the resulting radical (II, Figure 9) has not been Table 11" Comparison of theoretical and experimental HFCCs (G) for radicals formed through extensive damage to the sugar ring as displayed in Figures 9 and 10. Theory Experiment a'~ Radical Atom Aiso Txx Try Tzz Aiso Txx Try Tzz Figure 9, I
C4'H C5'H C5'H C3'H
-21.3 -13.0 0.0 13.0 -18.8 (32.2) -9.8 -0.2 10.0 3 2 . 8 -2.3 -0.8 3.1 48 (37) 3.8 - 2 . 1 -1.5 3.5 13 (27) 32.4 -1.9 -1.0 3.0
Figure 10, II/III/IV
C5'H C4'H O5'H
-14.1 -2.8 -4.2
Figure 10, IV/V
-7.6
-1.1
-1.8 -1.3 -4.4 -3.0
8.8
3.0 7.4
-8.0 0.2 3.8 -1.6 -0.2
-16.9
C5'H 0.0 -0.8 -0.7 1.5 O5'P - 2 1 . 1 -2.2 1.0 1.2 C2'H -12.7 -7.8 0.0 7.8 CI'H 27.3 - 1.0 -0.4 1.5 C5'H 11.2 -1.3 -0.7 2.0 aExperimental HFCCs for structure I, Figure 9, are from reference 52 (and Aii from 51). bExperimental HFCCs for structures in Figure 10 are from reference 53.
7.8 1.9
436
observed experimentally. The experimental HFCCs in the C4' centered ring opened radical exhibit great differences (Table 11). However, the sum of the C5'H couplings is very similar (61 and 64 G) indicating that alternative conformers may be responsible for the differences. Calculations reveal a large isotropic C4'H coupling (-21.3 G) possessing significant anisotropy (-13.0, 0.0, 13.0 G), not unlike that assigned in uridine 5'-monophosphate (5'rUMP) [52]. One small and one large C5'H couplings were also obtained from the calculations (3.8 and 32.8 G, respectively). The large C5'H coupling is not unlike those assigned experimentally (48 and 37 G). However a larger experimental coupling was obtained for the remaining C5'H coupling than calculated and a substantial C3'H coupling was also calculated but not observed experimentally. The good agreement observed for the C4'H and one of the C5'H couplings is promising and more accurate theoretical and experimental studies could possibly unveil any discrepancies and confidently identify this radical. The second series of ring breaking radicals is formed through removal of a portion of the sugar ring. The radical depicted in structure I, Figu/e 10, has been proposed to be formed in nucleotides by abstraction of a hydrogen atom from the C5' position by a base radical, followed by breakage of the sugar ring and reorientation about the C4'O1' bond [21]. A very similar radical appears in structure 1I, where this radical was observed only after irradiation at room temperature [53]. The coupling constants in these radicals were calculated using a model system (structure III) that represents either the phosphate [21] or carbon oo
eo
"'%
%/,H
/ca,-H
0
\
2
/H
/ca--c;\
ore;
H
I
/ R
H
Hj H
IV
-2 ./OPO3
F---c;
H
II
Base
/H
0
11I
: 9 H C/~ 3
OPO-32 \
H
V
Figure 10: Model systems used for various ring-breaking sugar radicals: radicals observed experimentally (I and II), model ring-breaking radical (HI), C5' centered radical proposed experimentally (IV) and the model ring-breaking radical with a phosphate group (V).
437
[53] group with a hydroxyl group. The experimental results (Table 11) include a large C5'H isotropic coupling (approximately -17 G) and a small C4'H isotropic coupling (4 G). The major difference in the two data sets is the magnitude of the largest component of the anisotropic tensor. The C5'H couplings calculated using the model system are in good agreement with the experimental results. However, the anisotropic results agree more closely with those shown in Table 11 [53] than results obtained in alternative crystals (-11.0, 1.1, 9.8 G) [21]. Hole et al. [53] proposed that an alternative explanation for the couplings observed in 5'GMP is the radical displayed as structure IV (Figure 10) where a large experimental coupling (-17 G) was suggested to arise from the phosphate group. The model system displayed in structure V, was used to test this hypothesis. The calculated resuks indicate that the phosphorus yields a similar coupling (-21 G) to that observed experimentally. However, the calculated phosphorus anisotropic and experimental C5'H couplings do not concur. Thus, due to the better agreement obtained for the ring-breaking radical modeled by structure Ill, it can be concluded that the most likely structure for the observed radical is that displayed as structure I. The calculations presented within this section provide support for experimental data which speculates that many different base and sugar radicals are formed upon irradiation of DNA base derivatives. In fact, the calculations even defend the possibility of the formation of sugar radicals that have been disputed to be important products. Furthermore, the formation of ring-altering sugar radicals in single crystals has been confirmed. This is very important information since sugar radicals have not been assigned in the spectra of full DNA. The confident identification of base and sugar radicals in single crystals will aid in the discovery of these radicals (if formed) in irradiated DNA. Understanding whether these radicals are generated in full DNA or whether they react to form other radicals is important information for the field of DNA radiation damage. 3. F U L L DNA STUDIES The previous sections have discussed the effects of radiation on individual DNA components in relation to experimental results obtained from single crystals of base derivatives at low temperatures. Issues can now be addressed which question the relevance of these studies to the identification of the radiation products in full DNA. Early ESR work on DNA revealed that the classification of radiation products is a difficult task since many of the DNA radicals are extremely similar and, therefore, the hyperfme couplings and g-factors are not sufficient to separate their spectra. The implementation of a variety of experimental conditions allows for the determination of the dependence of
438
radical formation on the environment (for example, strand conformation, hydration level, 02 content). Annealing experiments are also useful to determine which radicals are formed via decay of another product or to simplify the spectra.
3.1 The primary radicals Studies have been performed on DNA both in the dry state and in aqueous solutions [47]. Frozen aqueous solutions and low temperature glasses have been employed on occasion to investigate full DNA. The former is advantageous since it allows for the easy addition of additives, such as electron scavengers (FeC13 or Ka[-Fe(CN)6]) used to obtain information about electron loss centers. The latter is also useful since different reactive radicals can be stabilized and the specificity of a reaction can be studied by carefully selecting the glass-forming agent. For example, hydroxyl radicals are known to be abundant in BeF2 glasses, electrons in LjC1 glasses or in the presence of strong bases (NaOH) and hydrogen atoms in strong acids (H2SO4). Lyophilized (freeze-dried) powders prepared completely dry or with varying degrees of hydration (typically 2.5 to l l water molecules per nucleotide) are also often implemented. These experimental techniques yield random orientations of the DNA molecules and therefore the spectra are very broad and lack distinguishing features. Single-crystal studies would be beneficial, but it is not possible to prepare these samples for an entire DNA strand. An attractive alternative is to use oriented fibers. Despite great efforts put forth by experimentalists, the exact identity of most radical products generated in irradiated DNA is still unknown. The first ESR studies on full DNA only provided evidence for the formation of a thymine centered radical [54,55]. Work performed on DNA irradiated by ultraviolet light [56] and on oriented fibers [57] confirmed this radical to be formed through net hydrogen atom addition to C6 in thymine [T(C6H)]. The defil~tive 9O: | H~..
"
:O:
CH3
H~_~ ~
[
\dR
dR
T ~
G ~
Figure 11" The primary radical products generated according to the two-component model for DNA radiation damage.
439
identification of a thymine radical product led to the suggestion and subsequent proof [58] that T'- must be initially formed in irradiated DNA. Following these studies, little progress was made to classify additional radiation products in full DNA for years, although work continued on single crystals of base derivatives and other DNA subunits (as discussed in Section 2). The model of radiation damage to DNA was greatly enhanced through work performed on oriented fibers by Gr~islund and coworkers [59,60,61]. The radical mixture generated in DNA was suggested to be composed of thymine (and/or cytosine) anions and guanine (and/or cytosine) cations [59]. The initial assumption that cytosine may also be damaged was discarded [61 ] and the picture of radiation damage in DNA resulting in T ' - a n d G "§ became known as the "two-component" model (Figure 11). The two-component model for radiation damage in DNA was often criticized [18,62]. The main criticism was that the formation of T ~ was favored over C ~ only because the anionic radical converts to T(C6H) and products generated from cytosine anions were not observed. Additionally, major criticism of the two-component model arose since the spectrum assigned to T'- in nondeuterated DNA samples is in poor agreement with that obtained from single-crystal studies [62], and the spectrum does not change appropriately upon deuteration. This information indicates that some other species must be responsible for the spectrum [63], possibly C'- which yields a doublet with couplings approximately equal to those assigned to T'-. CuUis and coworkers [64] alluded that the two-component model for damage to DNA seems surprising since ionizing radiation damages indiscriminately and therefore initial electron gain and loss centers should include water, the phosphate group, the sugar moiety and all four bases. This was verified by examining DNA strand-breaks, which were determined to be formed at all centers rather than exclusively at thymine and guanine as predicted by the two-component model [64]. More evidence supporting C'- as the major anion formed in irradiated DNA also appeared in the literature. Bernhard and coworkers determined that C'- is the predominant electron gain radiation product in low temperature glasses of oligonucleotides [65] and that it may also be the major anion generated in DNA [66]. Through the use of computer simulations, Sevilla et al. [67] determined that 77% of all anions are C ' - a t 100 K. However, since the spectra of C ' - a n d T'- are so similar slight changes in the simulation input can yield very different percentages [64]. Furthermore, the one-electron reduction potentials of the bases in aqueous solutions indicated that C'- has a greater tendency to be protonated by its base pair guanine than thymine by adenine and therefore should be the most easily reduced base in DNA [68]. Studies on frozen DNA
440
samples predicted that T " slightly prevails in single-stranded DNA whereas C'predominates in double-stranded DNA, where differences arise due to interstrand base-pairing and base-stacking effects that allow electrons to travel throughout the strand [69]. The debate over the site of electron loss in DNA is much less pronounced since it has been estimated that over 90% of the cations generated in DNA are centered on guanine [67] and guanine end products account for 90% of the electron loss products in DNA [70]. However, the spectra of G "+ recorded in solid-state studies of nucleotides and nucleosides do not correspond to the spectrum recorded in full DNA [71] and investigations of the strand-break specificity determined that some adenine cations could be generated [64]. Thus, it is also possible that other cations are formed, primarily A "+. More information about the specificity of electron gain and loss in DNA can be obtained by calculating the ionization potentials (IPs) and electron affinities (EAs) of the bases. Table 12 compares the IPs for the nucleobases obtained experimentally with those obtained from Mr (MP2) single-point calculations on HF geometries [72] and from DFT (the B3LYP functional). The theoretical data is in good agreement with the experimental results, where all three data sets predict the magnitude of the ionization potential to follow the trend T > C > A > G. Thus, an electron is most easily removed from guanine, which supports the experimental predictions that the guanine cation is the major oxidation product in irradiated DNA. Limited experimental data is available for the EAs of the DNA bases. The trend in the "estimated EAs" (obtained by correcting the HF Koopmanns EA by the calculated nuclear relaxation energy) is T > C > A > G, which is in agreement with early studies on DNA predicting that the thymine anion is the major reduction product upon irradiation [72]. Alternatively, the trend predicted through examination of the adiabatic EAs calculated with DFI' (C > T > G > A) supports experimental data predicting cytosine to be the major reduction site in Table 12: The adiabatic IPs and EAs (kcal/mol) of the DNA bases obtained at various levels of.theor~ and experimemally. IP EA DFI' MP2 Exp. DFT DFT(+) "Estimated" T 196.0 204.2 204.6 -14.8 3.3 7.2 C 1 9 4 . 2 201.5 200.1 -13.8 -1.4 4.8 A 182.3 188.6 190.5 -17.7 -9.1 -7.2 G 1 7 1 . 8 176.6 179.3 -15.8 -6.4 -16.7
441
irradiated DNA. The interesting feature of the EAs is that the "estimated" values for A and G, as well as all adiabatic DFT results are negative. The EA is defined as the energy required to add an electron to a neutral molecule and calculated as the energy of the neutral molecule minus the energy of the anion. Therefore, a negative value for the EA indicates that the anion is higher in energythan the corresponding neutral molecule. A negative EA cannot be measured experimentally due to the dissociation of the anion into an electron and the neutral molecule before nuclear relaxation. One predominant flaw in the DFT results is that diffuse functions were not included in the calculations and these are known to be essential for the accurate calculation of EAs. The inclusion of diffuse functions on the heavy atoms was accomplished with the 6-31+G(d,p) and 6-311+G(2df, p) basis sets for the geometry optimizations and single-point calculations respectively (Table 12, DFT(+)). Inclusion of diffuse functions leads to a positive EA for only thymine. Additionally, the order of T and C is reversed when diffuse functions are included in the calculations, as the results now indicate that thymine is the most favorable anion formed on a base center. The IPs also improve through the use of diffuse functions, for example the IP of thymine changes from 196.0 to 201.6 kcal/mol when diffuse functions are used (experimental value: 204.6 kcal/mol). To better understand the trend in EAs for the DNA bases, a more systematic study must be performed. A good starting point would be to apply techniques known to yield highly accurate thermochemistry, such as the Gaussian-n techniques, to the smaller bases thymine and cytosine. In particular, the introduction of G3 methods using Mr theory has reduced the computational cost of these methods (allowing calculations to be extended to at least 10 heavy atoms) and at the same time increased their accuracy.
3.2 The Secondary Radicals At higher temperatures, ionic radicals are not expected to be stable, but rather these species protonate or deprotonate to form neutral, secondary radical products. As mentioned, the first of these products, T(C6H), was identified as evolving from the thymine anion in early ESR studies. Later, the decay of the guanine cation was predicted to be related to the growth of G(N1) [73,74]. T(CH2) has also been observed in highly hydrated DNA samples [73]. Evidence exists that the cytosine anion is stabilized by protonation at N3 at 77 K [75]. Additionally, in thymine deuterated DNA samples, a deuteron has been determined to add to the C6 position of the cytosine anion [73]. Despite the fact that the types of products observed are diverse, these products were each observed in different samples.
442
:0:
"
I
"
dR T'(O,R) :0:
H~?l j ~
H2
I
dR
l~k\
jl~
] H
\ dR
G'+ (O, R)
H
I
dR
C'(O,R)
:0: H ~ _ ~ II
Jl~
"
dR
T(O4H) (O)
\ dR
N
I
C(N3H) (O,R)
~I
"H
dR T(CH2) (O,R)
-203PO'~H2)'~N ~ O.. . ~ Base ]
:9H
H H
G(N2H) (O)
C1" (O, R)
:~:
:0:
HX
~)~ "
H
A(N3H) (O) H-,,.O#HH
HX~
CH3
.
H
dR A'+ (O)
G" (O)
T(C6H) (R)
C(N4H) (R)
:0:
H2
\ dR G(N1) (R)
:OH H C3" (R)
C4" (R)
~dR C8" (R)
Figure 12: Radicals predicted to be formed in orientated (O) and randomly (R) oriented samples of DNA. Advances have been made in the past few years to identify more than two or three products in one D N A sample. The most promising results were obtained by Htittermann and coworkers, in both oriented fibers [76] and in randomly
443
oriented DNA [77,78]. Through the use of the field-swept electron spin-echo technique, nine clear patterns were identified, and seven radicals proposed (Figure 12), in the ESR spectrum of oriented DNA fibers at 77 K [76]. Species identified which were previously discussed in the literature as possible damage products in oriented fibers include T'-, or T(O4H), C", or C(N3H), and G "§ Newly proposed radicals for oriented fibers include T(CH2) and the C I " sugar radical. Assignment was also made to A(N3H), although the spectrum of this radical was not clear in full DNA and differs from that obtained in the copolymer poly(A:U). Another spectrum, for which little direct information could be obtained, was previously assigned to G "+. However, since G "+ was already assigned in the study under discussion, suggested assignments include G " or A "+, as the adenine anion was already related to A(N3H). The two remaining components could not be assigned due to insufficient information. The first study performed on randomly oriented fibers, which detected more than two or three ionic species, was performed on DNA equilibrated at various levels of hydration, as well as on frozen aqueous solutions [77]. In lyophilized powders, G "+, C'- and T'- were identified without any uncertainty for the first time. The spectra obtained for frozen aqueous solutions were very different from those equilibrated at 76% relative humidity, since the amount of G "+ is considerably reduced. T(C6H) and T(CH2) were also assigned. A continuation of this study directly analyzed the spectra obtained from lyophilized DNA powders (in dry environments and equilibrated at 76% relative humidity) using electron scavengers rather than results obtained from model systems. Many new radicals were identified besides T(C6H) (Figure 12). The spectrum previously assigned to G "§ was reassigned to the cytosine radical formed via net hydrogen atom addition to the amino group [C(N4I-I)]. This is the first time C(N4H) has been proposed for DNA, although it has been identified in aqueous solutions of cytosine derivatives [79]. Spectra were also assigned to T", C'- or C(N3H), T(CI-I2), C1" and G "+. Two additional patterns were acknowledged for the first time, one was speculated to be due to radical addition to the C8 position in one of the purines (C8") and the other due to G(N1). An additional spectrum was speculated to be due to the C4" or C5" sugar radical, but a definitive assignment could not be made. At high doses of radiation, a spectrum appeared which gave strong indications of being due to C3" or C4". These studies on oriented fibers and randomly oriented DNA are very important since they are the first studies to demonstrate the great variety of radicals that can be identified in irradiated DNA. The role of sugar radicals in DNA radiation damage is uncertain since no sugar radicals were identified in preliminary studies on full DNA samples [42].
444
However, at least nine different sugar radicals were observed in irradiated single crystals of 2'-deoxyguanosine 5'-monophosphate [21]. It was originally suggested that damage quickly shifts from the sugar (where alkoxyl radicals are often observed in nucleotides but cannot be formed without a strand break in DNA) to the bases, especially after annealing [15,80]. Alternative explanations include a small abundance of radicals, multiple conformations, the similarity of the radical's spectra and the limitations imposed by the sole use of ESR (rather than more involved techniques) [14,42]. Despite the problems associated with the identification of sugar radicals, Hiittermann and coworkers [77,78] provided the first direct evidence that these radicals are formed in full DNA samples and proposed the formation of the C I', C3', C4' and C5' centered radicals. In addition, studies performed with heavy ion beam irradiation of DNA noted the resemblance between the simulated spectra of the C4 '~ and C3 '~ radicals and the spectrum of DNA [81 ].
9
-
Base
.
:0:
il
..
~o--.e---oc~I2 ~A
HH
~
i
H ! H
: O:
: nO - - - ~ O
i a :O:
i,,
-
I
..
Base
:o--e-=~5
:O:
cs'(m)
:0:
:0:
li
II
--..o-Q "
|
HH
:O:
H
|
e
: O: ,I
: O---~O
"H
H[
H
: O: 9l
7,,-,,
e :B___le-=B
:Of
P1
P2
Figure 13: The first phosphate derived radicals observed in DNA.
P3
445
Similarly, little evidence has appeared for the formation of phosphate centered radicals. Studies on model systems show that electron capture at the phosphate group would result in cleavage of the phosphoester bond [82,83]. Additionally, sugar radicals of the form displayed in Figure 13 (C5'(I-I2)) have been observed and the most likely mechanism for their formation is through capture of an electron at a phosphate group [21]. Electron transfer to the DNA bases from the phosphates is also likely [83], and supporting evidence has been obtained which indicates rapid elimination of the phosphate-ester group through a C4' centered sugar radical (S, Figure 13). The only direct detection of phosphate centered radicals was obtained through heavy ion beam irradiation of DNA [81], where large couplings were assigned to the phosphorus atoms in radicals displayed in Figure 13 (P1 and P2, or possibly P3). From these studies it is clear that damage to DNA is broader than initially expected from the two-component model since products on all four bases and the sugar moiety have been proposed. These proposals include sugar and phosphate radicals despite early failures to detect radicals in the backbone of the DNA double helix. More work is required in order to determine the exact identity of the radical products since structural information is difficult to obtain through the methods implemented thus far. 3.3 Effects of water on radical formation in D N A
Besides direct damage of the DNA strand, it is also possible for the surrounding water molecules to be involved in radiation damage mechanisms. The hydration layer of DNA consists of a primary layer (approximately 20 or 21 water molecules per nucleotide), which possesses properties different from crystalline ice upon freezing, and a secondary layer, which cannot be distinguished from bulk water upon crystallization. Upon irradiation of water, many different products can be formed: "OH + e'-'(aq) + "I-I + H 2 0 "+ + H + + H 2 0 2 + H2
The first 14 water molecules per nucleotide in the hydration layer surrounding DNA have approximately the same mass as DNA [84] and, therefore, the same number of ionizations are expected to occur in the primary hydration layer as in the DNA strand. However, it is unknown how the water molecules in the primary hydration layer are affected by radiation. One possibility is that water cations and electrons are formed, which transfer their ionic character to the DNA strand (quasi-direct effects). Water cations can also transfer protons to neighboring water molecules resulting in hydroxyl radicals. The products formed in the hydration layer (hydroxyl radicals, hydrogen atoms or aqueous electrons) can subsequently react with DNA (indirect effects). Quasi-direct and indirect effects are expected to yield very different radicals.
446
Perhaps the first indication of the dependence of DNA damage on hydration was reported for frozen aqueous solutions [85], where the radical yield in wet DNA was reported to be twice the yield obtained in dry DNA. Additionally, the yield of radical ions at 77 K was found to increase by a factor of four upon inclusion of the primary DNA hydration layer [86]. In lyophilized DNA, it was instead noted that radical yield increases with hydration to a certain extent, but then a plateau is reached that cannot be surmounted by increasing the level of hydration [73]. The absolute yields of the individual ion radicals have also been determined to vary with hydration, where for example T'-predominates in dry DNA and C'- predominates when the hydration layer is included [75,77]. Alternatively, evidence exists which indicates that DNA damage does not increase with consideration of the primary hydration layer, but increases upon inclusion of the secondary layer. These studies include investigations of the release of unaltered bases [87], the production of base damage products (14 detected in total) [70], and the efficiency of strand breaks [88]. The investigations discussed thus far used the fact that hydroxyl radicals, hydrogen atoms and free electrons were not observed in the primary hydration layer of DNA to speculate that damage due to the hydration layer must occur via quasi-direct effects. However, it is possible that hydroxyl radicals are formed, but are not detected due to weak signals, they rapidly react with the DNA strand or the generated radicals quickly recombine [89]. Conversely, it is accepted that hydroxyl radicals can be formed in the secondary hydration layer, where water molecules are more loosely bound. A major revelation in this area was obtained in a study of ),-irradiated DNA where hydroxyl radicals were observed in low yields in the primary hydration layer and it was therefore concluded that most of the oxidative damage in the hydration layer is transferred to DNA [34a]. Reinvestigation of this problem revealed that the hydration layer can be separated into three partitions: (1) the first 9 water molecules which do not form significant amounts of hydroxyl radicals, but transfer their charge upon irradiation to DNA; (2) an additional 12 water molecules completing the primary hydration layer which predominantly form hydroxyl radicals, but unsubstantial charge transfer may also occur; and (3) bulk water which forms hydroxyl radicals [34b]. It is still possible that hydroxyl radicals were not detected in the first 9 water molecules since they react quickly with DNA or they could simply not be detected with ESR. In aqueous BeF2 glasses of base derivatives, hydroxyl radicals were found to add to the C5C6 double bond in cytosine and uracil, abstract a hydrogen atom from the methyl group in thymine and add to C2 in adenine [90]. In aqueous
447
solutions, hydroxyl radicals have been determined to add to the C5C6 double bond in all pyrimidines [91] and to C4, C5 and C8 in purines [92]. Differences are also thought to exist between low temperature glasses and frozen aqueous solutions, where indirect and quasi-direct pathways are thought to predominate in the former and latter, respectively. Htittermann et al. proposed a new mechanism for radiation damage in frozen aqueous solutions, which involved oxidation at water followed by net hydroxyl radical or hydrogen atom addition to C6 in thymine and net hydrogen atom abstraction from the methyl group [93]. This is the first indication that in frozen aqueous solutions hydroxyl radicals can take part in the radiation damage to DNA components [37,94], although the spectra assigned to T(C6OH) could arise due to attack at C6 in thymine by a neighboring allylic radical (dimer radical) or by its own sugar group (cyclic radical) [95]. More recent work indicates that the allylic radical could be formed via a base cation without the formation of hydroxyl radicals and therefore contradicts this proposal [96]. Work on single crystals of DNA components has also suggested that water can be involved in the initial ionization process. Studies on single crystals of guanine derivatives determined that it is necessary to consider ionization of the surrounding water molecules in order to account for the formation of the identified radicals [27,29,30,97]. Comparison of calculated HFCCs and those elucidated in cytosine monohydrate crystals also supports water as a site for oxidative damage.
3.4 Major radical products formed in irradiated DNA Ionizing radiation damages indiscriminately and the number of initial damage products formed on a particular center is proportional to the mass of the center under consideration. Therefore, upon irradiation of a DNA strand, the primary radicals formed should include cationic and anionic radicals of each base, the sugar moiety and the phosphate group. As discussed, the yield of damage to the DNA strand has been determined to increase upon consideration of the hydration layer [70,87,88], which indicates that the water surrounding DNA plays an important role in the radiation damage mechanism. More specifically, since living entities are largely composed of water, a model of the radiation damage to DNA must also encompass the ionization of water molecules, which generates water cations and electrons. However, the abundance of other organic molecules with which these species can react and the amount of room available for radical migration must also be considered. Any of the primary radiation products can transform into secondary radicals by protonation or deprotonation. Early evidence for radical transfer to secondary products was obtained by recognizing the relationship between T'- and T(C6H) [60].
448
Due to the nature of the DNA double helix, it is possible for the initial damage to be transferred through the DNA strand to produce more stable intermediate radical products. Electron transfer has been reported to occur over as few as three base pairs to as many as one hundred. [98]. The consensus in the literature regarding radicals initially formed upon irradiation of DNA is that the primary electron loss center is guanine and the primary electron gain centers are cytosine and thymine. The formation of these primary products is also supported by ab initio [72] and DFF calculations. Thus, if an adenine anion is formed initially, the electron can be transferred throughout the DNA strand to produce either a thymine or cytosine anion. Interbase electron transfer is possible in DNA due to the small distance between base pairs, which results in an overlap of the n-systems, and hydrogen bonding of the bases [92]. Evidence for charge transfer through the DNA strand can be obtained from a study that predicted thymine anions to be present in slightly larger yields in single-stranded DNA, while the cytosine anion clearly predominates in double-stranded DNA [69]. More evidence for transfer of anionic character has been obtained in single-crystal studies. For example, despite the fact that the primary radicals identified in cocrystals of 1-methylcytosine and 5-fluorouracil were the cytosine anion and the uracil centered cation, the only net hydrogen atom addition products observed evolved from the uracil anion [99]. Furthermore, in cocrystals of adenine and various uracil (or thymine) derivatives, net hydrogen atom addition adenine radicals were observed despite the fact that uracil (or thymine) anions are expected to be the primary anions formed. However, although the adenine cation and the amino-deprotonated counterpart were observed in cocrystals of 1-methyluracil and 9-ethyladenine [100], uracil and adenine acted as if they were isolated from one another, which indicates that transfer of radical character does not occur in these crystals. Negative charge can also be transferred to the DNA hydration layer. For example, Steenken suggested that upon formation of the adenine anion, proton transfer from T(N3) to A(N1) occurs, forming the thymine anion, which is subsequently protonated by a nearby water molecule to form hydroxyl anions [101 ]. Thus, initial reduction of adenine could lead to an abundance of negative charge in the hydration layer. Alternatively, the adenine cation could transfer non-hydrogen bonded amino-protons to a neighboring water molecule. Thus, these experimental results indicate that the charge can be transferred from bases in the DNA strand to the hydration layer where it can be stabilized or additional water radicals can be formed to attack the bases and the sugar moiety. Alternatively, long-range hole transfer in DNA is considered to be more difficult. However, evidence supporting hole transfer in some crystals
449
(co-crystallized with thio derivatives) does exist, which provides evidence that hole transfer may also occur in DNA [98]. For example, positive holes formed on thymine, cytosine or adenine can be transferred to guanine. The radiation products generated in DNA will be discussed in the next two sections in terms of how the primary cation and anion radicals decay to form secondary radical products. This discussion will encompass results from single crystals [15], the aqueous state [92,101], the calculations presented in Section 2, as well as those obtained from ab initio studies [72], and studies on oriented and randomly oriented DNA [76,78].
3.5 DNA cations and secondary radicals Cations can be formed via direct ionization of the DNA strand or through transfer of the positive charge from irradiated water molecules in the hydration layer. Sugar radical cations can be formed via transfer of the radical character from the base cations. Once formed, cations can recapture an electron, generated from either ionization of water or the DNA strand to heal the damage, or positive hole transfer can occur, where the favorable electron deficient center in DNA is guanine. At higher temperatures, or more specifically those of biological systems, neutral radicals are more probable and cations are expected to deprotonate. However, in experimental studies on DNA, it is difficult to determine the deprotonation state of the primary radical products. This is clearly seen from theoretical calculations performed on model systems [1-4], which illustrate that there exists very little difference in, for example, the spin densities of cations and their deprotonated counterparts. The thymine cation has not been identified in experiments on single crystals [ 15] and ab initio calculations predict that this base has the largest IP [72]. However, T(CH2) has been identified in all thymine derivatives [15], an assignment which was supported by HFCCs calculated with DFT. Thus, assuming that the thymine cation is stabilized for a sufficient period of time in DNA to allow for deprotonation, the most abundant secondary thymine radical would be formed via loss of a methyl proton. This hypothesis is supported by the fact that T(CH2) has been identified in the most complete studies on both oriented fibers [76] and randomly oriented DNA [77,78]. Studies of the redox properties of base pairs indicate that one-electron oxidized thymine in DNA should be characterized by both T "+ and T(N3) implying proton transfer can possibly occur [101]. The T(N3) radical has however not been identified in single crystals through comparison of calculated and experimental HFCCs, even in studies on single-crystals of base pairs. Moreover, this radical has not been suggested to be
450
formed in full DNA. This indicates that proton transfer cannot compete with deprotonation at the methyl group. Little experimental evidence has been obtained for the formation of the cytosine cation. Early ESR studies predicted that the cytosine cation is formed in cytosine monohydrate crystals, however, through the use of the ENDOR technique this assignment was determined to be unlikely [15]. In single crystals of deoxycytidine 5'-monophosphate, the cytosine cation was also postulated, but the HFCCs did not match those calculated with DFF. The only direct successor of this cation discussed in the literature is that formed via net hydrogen loss at N1. In cytosine monohydrate crystals, this radical product was postulated, but through comparison with calculated HFCCs, a new mechanism was proposed involving oxidation at water rather than at cytosine. The N l-deprotonated cytosine radical is irrelevant when DNA is considered since the hydrogen at N1 is replaced with ~deoxyribose. Alternatively, sugar radicals have been observed in some cytosine derivatives [15]. These radicals could be formed from the cytosine cation, where the cationic nature is transferred to deoxyribose and deprotonation subsequently occurs at the sugar moiety. The instability of the cytosine cation in single crystals indicates that upon irradiation of DNA, the formation of the cytosine cation, or its secondary radical products, is unlikely. This is in agreement with results obtained from the redox properties of the base pairs which determined that the cytosine cation will not deprotonate since guanine is such a weak base [101]. In addition, since cytosine is base paired with guanine, which is well accepted to be the ultimate cationic site in irradiated DNA, transfer of the positive charge from cytosine to guanine (or to the sugar moiety) is more likely than the formation of a cytosine radical by deprotonation. The adenine cation has not been confidently assigned through comparison of calculated HFCCs and those obtained from single crystals of nonprotonated adenine derivatives unless co-crystallized with another base derivative. However, a study performed on the co-crystals of 1-methyluracil and 9-ethyladenine detected the adenine cation at 10 K [100] and the HFCCs agree well with the calculated values. FtLrthermore, the cation can be observed in protonated crystals [15]. The extreme conditions at which the adenine cation was observed in these studies are not evident in full DNA. Deprotonation of the adenine cation is expected to occur primarily at the amino group. In single crystals it has been determined that this radical is formed if one of the amino hydrogens is involved in a hydrogen bond to a site which can transfer the damage further away from the initial adenine molecule [20]. In DNA, the proton could be transferred through the hydrogen bond formed with
451
the base-pair thymine, although further transfer through a hydrogen bond network is not possible. In cocrystals of 1-methylthymine and 9-methyladenine, no products formed via deprotonation of the adenine cation were detected, which was believed to indicate that proton transfer between adenine and thymine is unlikely [102]. These results indicate that stacking and hydrogen bonding effects are not sufficient for radical stabilization. In solution, it has been determined that although the adenine cation is a strong acid, thymine is a poor base and therefore will not abstract a proton from adenine [101]. Ab initio calculations also predict that proton transfer is not favorable in adenine and thymine ion pairs [72]. These results indicate that the effects of base pairing on the formation of the adenine cation or its secondary radicals in DNA are unknown and hydrogen transfer between base pairs cannot be used to justify the most abundant adenine deprotonated radical. An alternative possibility for the formation of A(N6H) in DNA is that the hydrogen not involved in the base-pair hydrogen bonding could be removed. In some adenine crystals, the C I' sugar radical (C1 '~ was detected and postulated to be formed from the adenine cation [15]. Thus, if an adenine cation is stabilized for a time longer than that required to transfer its cationic character to guanine, either deprotonation at the amino group or transfer of the cationic character to the sugar moiety is expected. As discussed, it is agreed upon in the literature that guanine is the major oxidation site in DNA. Ab initio calculations on base pairs indicate that the IP of the guanine-cytosine base pair lowers to a greater extent than the IP of the adenine-thymine base pair relative to guanine and adenine, respectively [103]. This lends even more support to guanine being the major positive center in DNA. Despite this fact, the HFCCs calculated with DFI" do not support the experimental assignment to the guanine cation in single crystals. Deprotonation of the guanine cation is also expected in solution, however the equilibrium constant was determined to be small. The primary product formed via deprotonation of this cation in single crystals is G(N2I-I). Alternatively, in solution, deprotonation primarily occurs at N1 [92]. In DNA, deprotonation at N1 or the amino group are both possible due to transfer through a hydrogen bond with cytosine. However, since N3 has been determined to be the most likely site for protonation in cytosine (to be discussed in Section 3.6), transfer from N1 may be favored in DNA. Ab initio calculations have determined that the guanine-cytosine base pair cation can readily undergo proton transfer along the C(N3)-G(NIH) bond, where the activation barrier was calculated to be only 0.9 kcal/mol after correction for zero-point vibrational effects, and the products are only 1.6 kcal/mol higher in energy than the reactants [103]. Altematively, if transfer does not occur through the hydrogen bonds, but rather protons are released into the surrounding environment as proposed for adenine, then the
452
amino hydrogen not involved in a hydrogen bond can be deprotonated. Only the G(N1) deprotonated product has been identified thus far in studies of randomly oriented DNA [78]. It has been suggested that since the predicted total yield of anions is larger than the total yield of cations in a 1.4:1 ratio in DNA, some cations may have been left undetected. This provides evidence that oxidation may also occur on the DNA sugar moiety. Deoxyribose has an IP larger than the bases, but smaller than the phosphate group [72], indicating that cation formation could occur on this center. It should also be noted however that calculations accounting for the phosphate hydration layer indicate that the IP of the sugar and the phosphate groups are more similar to one another [72]. In single crystals, direct oxidation of the sugar moiety is expected to result in alkoxyl radicals, which are commonly observed in various base derivatives [15]. Other sugar radicals can be formed directly from alkoxyl radicals or hydrogen atoms can be abstracted by neighboring molecules in the single crystals. Oxidation of a base followed by transfer of the radical character to the sugar moiety can also result in deoxyribose radicals. However, transfer of radical character from the sugar to the base was observed at 200 K in single crystals of 2'-deoxyguanosine 5'-monophosphate and, thus, this pathway may not be relevant to radiation effects on living systems. Any of the mechanisms discussed for the formation of sugar radicals can be expected to lead to deprotonation at any of the carbons (CI' to C5'). In studies on single crystals of base derivatives [15,21], the C I' position appears to be the favored site for deprotonation. It is speculated that thymine and guanine derivatives are more likely to deprotonate at the base rather than transfer character to the sugar group due to the abundant formation of alternative deprotonated radicals. The C 1' centered radical has been suggested as a product in oriented fibers [76] and randomly oriented DNA [77,78]. The formation of the C3', C4' and C5' centered radicals was also postulated in DNA samples [78]. On the contrary, the C2' radical has not been suggested to be formed in DNA. This is supported by both ab initio [72] and DFF calculations [5], since both predicted the C2' radical to be much higher in energy than the other carbon centered radicals which are all very close in energy. Additional sugar radicals have been observed in single-crystal studies, which involve considerably more damage to the sugar ring than breakage of one bond. The relevance of these structures to DNA is unknown at this time since none of these products have been observed in irradiated samples.
453
Products formed by loss of an electron from the phosphate group have not been identified in single-crystal studies of base derivatives or studies on full DNA. Experiments and calculations indicate that the IP of the phosphate group in DNA or outside the helix is low [72]. However, if an environment which is more relevant to biological systems is considered (for example, inclusion of solvation or countefion effects), then the IP increases by a factor of 2 to 2.5 [72]. Thus, products generated by loss of an electron from the phosphate groups are unexpected in DNA. It is postulated that these radicals are quickly repaired by capture of an electron. The role the water encompassing the DNA strand plays in radiation damage appears to be unsettled. However, it is agreed that water is primarily involved in the radiation process through an oxidation-type mechanism. Oxidation of water leads to free electrons and H20 +, which can dissociate to form protons and hydroxyl radicals. The hydroxyl radicals can subsequently react with any of the undamaged bases or the sugar group. Aqueous [91,92] and solid-state [34b] results predict that the primary sites for hydroxyl radical addition is across the C5C6 double bond in the pyrimidines and at C8 in the purines, as well as C2 in adenine. In a study of randomly oriented DNA [78], a secondary product was identified to be generated through radical addition to C8 in one of the purines. This species could be attributed to hydroxyl radical addition to C8 in guanine or adenine. Alternatively, hydroxyl radicals can abstract a hydrogen atom to form, for example, T(CH2) or carbon-centered radicals in deoxyribose. Whether hydroxyl radicals prefer to abstract hydrogen from the sugar moiety or add to the bases remains to be determined. It should be noted that although the secondary radicals mentioned in the present section were discussed in terms of formation from the primary cationic centers, other pathways may lead to the equivalent species. For example,' upon irradiation of DNA it is possible to generate excited species. The excess energy on these centers can be relieved by dissociation of an X-H bond that would result in radical products equivalent to those discussed above. Excitation could occur at the bases to yield for example T(CH2) or at the sugar group to yield any of the net hydrogen atom removal radicals (C1 '~ to C5'~
3.6 DNA anions and secondary radicals The generation of cations through irradiation of DNA and its surrounding water molecules yields a supply of electrons that can add to the DNA strand to generate anionic centers. Similar to the cations, these anions may be stable under extreme conditions, but they can be expected to rapidly protonate at elevated temperatures. The protons can be obtained from deProtonafion of the
454
base, sugar or water cations. The protonation state of the anions in DNA is difficult to determine. In particular, if the added proton lies in the molecular plane, which is often the case, the resulting HFCCs are very small and extremely difficult to detect even with the sophisticated ENDOR technique. Through comparison of data from single crystals [15] and DFT calculations [14], it can be determined that at 10 K the thymine and cytosine anions are protonated in many different crystals. Since radicals formed through net hydrogen atom addition have been observed with ENDOR spectroscopy even at low temperatures in single crystals, it seems likely that thymine and cytosine radicals should also exist as neutral species in irradiated DNA. The most probable sites for protonatation are 0 4 and N3 in thymine and cytosine, respectively. These protonation sites are even more likely in full DNA samples due to the hydrogen bonding interactions between the base pairs. In particular, the ease of proton transfer along the C(N3)-G(N1H) bond in the guanine-cytosine base pair cation has already been discussed and proton transfer has been determined through ab initio calculations to be favorable in guanine-cytosine ion pairs [72]. Furthermore, if the cytosine anion is formed, which is a strong base, it is base paired with guanine, which is a strong acid, and proton transfer is very favorable [101]. T(O4H) and C(N3H) have been speculated to be formed in full DNA [76,78]. It is also possible to protonate along the C5C6 double bond in both pyrimidines. The thymine C6-hydrogenated radical was observed in the first ESR studies on irradiated DNA [56] and has been identified with more advanced methods [76,78]. It is expected that this radical is predominant since adenine is a weak acid and therefore cannot donate a proton to its thymine base pair at the 0 4 position. Ab initio calculations have shown that proton transfer ability across the T(N3H)-A(N1) bond in the adenine-thymine base pair cation is poor [103]. Although transfer between T(O4H) and the adenine amino group was not investigated, other calculations have shown that proton transfer is not favorable in adenine-thymine ion pairs [72]. Furthermore, single-crystal studies indicate that transfer across a hydrogen bond where the acceptor is a ketyl oxygen (=O) represents less favorable conditions for a successful proton transfer [20]. Thus, evidence exists suggesting that proton transfer across the T(O4)-A(N6H) hydrogen bond may be slow. Therefore, other proton-donating agents (such as water or free protons generated from deprotonation of base cations) have an opportunity to react with the thymine anion. In particular, protonation is expected to occur at C6 (or C5) in thymine [T(C6I-I) or T(CSH)].
455
In addition to the C(N3H) product, the cytosine N4 protonated radical [C(N4H)] has been proposed experimentally for full DNA samples [78,]. This radical has been observed in single crystals of cytosine hydrochloride [104] and couplings calculated with DFF for this radical are in good agreement with experiment even though the chlorine counterions were not included in the model system [39]. If protonation from a neighboring guanine molecule is slow, then there exists the possibility of the formation of the N4-hydrogenated radical. Moreover, the radicals formed by protonation across the C5C6 double bond [C(CSH) or C(CrH)] could be generated, both of which have been observed in single crystals and the assignment is supported by DFT calculations cited in Section 2. The c ( C r H ) product has also been observed in deuterated DNA samples, where a deuteron adds to C6. However, as indicated by ab initio calculations, proton transfer is favorable in the guanine-cytosine base pair ions and C(N3H) is probably the most predominant cytosine net hydrogen addition radical product [72]. It is interesting to note that cytosine has one more probable protonation product than thymine, which could offer an explanation for the experimentally observed higher yield of the cytosine anion, since it is difficult to detect the differences between the cytosine anion and its protonated analogs by ESR. The adenine anion has also been determined to be protonated in single crystals at very low temperatures. The main protonation site in single crystals is N3, which is supported by DFF calculations [4]. Furthermore, protonation can occur at both C2 and C8, where these sites are favorable under conditions where N3 is not involved in a hydrogen bond in single crystals [15]. In the aqueous state, the adenine anion has been shown to accept a proton from N3 in thymine at the N1 position [101]. This can be followed by a 1,2-shift to form the A(C2H) product [92]. Only the A(N3H) product has been assigned in oriented DNA [76]. However, a product has been identified in randomly oriented DNA and assigned to a net radical addition product at C8 in one of the purines [78], which could be associated with A (C8H). The guanine anion has been suggested as a product in some single crystals. However, since the other three bases were determined to be protonated even at low temperatures and the anion and its protonated form possess similar characteristics, it is unlikely that the guanine anion will be observed directly in irradiated DNA samples. Through comparison of single-crystal and calculated results, the primary protonation site for the guanine anion is 06. In full DNA, this position is hydrogen bonded to the amino group of its base-pair cytosine. However, the amino-dehydrogenated cytosine radical has not been observed in either single crystals or irradiated DNA. Furthermore, from studies in aqueous solutions it is known that cytosine is a weak acid [101]. Thus, a simple proton
456
transfer mechanism seems unlikely. Comparison of single crystal results and calculations indicates that alternative sites for protonation include C8 and C5. Electron capture at the sugar group is not expected to occur. This is primarily due to the fact that the electron affinities of the bases are much larger than that of the sugar group and therefore the bases are expected to shield deoxyribose. However, a radical formed by a rupture of the phosphoester bond at C5' was determined to be formed at 10 K in 2'-deoxyguanosine 5'-monophosphate (C5'(H2), Figure 13) [21]. Since this radical was formed at such low temperatures, it can be speculated to be generated through a reductive pathway at the sugar group rather than through transfer of character from the base. Thus, although products generated from electron capture at the sugar were not expected as forms of DNA damage in the past, a reductive mechanism involving deoxyribose cannot be ruled out for radical formation. In addition, a similar radical could be formed at the C3' position (C3'(H)). If these radicals are generated in irradiated DNA, then a prompt strand break will occur. It should be noted that hydrogen abstraction radicals have been shown to be products of reduction pathways in related sugars [ 105]. The phosphate group is also a possible site for electron capture. Two phosphate-centered radicals were discussed in a previous section and speculated to be due to electron gain on the phosphates at either C3' or C5' (P1 or P2, Figure 13) [81]. Radical character could also be transferred to the sugar moiety. Alternatively, as discussed in Section 3.5, electron capture at the phosphate group could lead to elimination of this group, or strand breaks in DNA, by the formation of the C5'(H2) or C3'(H) sugar products. This is thought to occur mainly through abstraction of hydrogen from C4' which forms a radical at this center [21,106]. It should be noted that the products discussed within could also be formed via hydrogen atom addition. These hydrogen atoms can be generated via recombination of an electron and a proton or as products following excitation of the bases or sugar moiety. For example, in randomly oriented DNA a radical product was identified as being formed by radical addition to C8 in one of the purines (adenine or guanine) [78]. 4. A M U L T I - C O M P O N E N T M O D E L F O R DNA RADIATION D A M A G E
Figure 14 summarizes the explanations provided in the previous sections for the effects of radiation on the entire DNA strand and the surrounding water molecules. The diagram depicts the formation of the primary radicals (cation
457
DNA + Radiation
,W'"
,i~ , :
~
T'"
C"
!:', \
,
~: ,O
+
-
A
T i
\
,,
,,
~§
-1
i
\ \
\
~
ol
! 1
!!
I 1
; I ,.j I
,
I
\
T-
i i i i
I I 1 I I
T+ ~
I 1 i
C
I
c§
,,,--| __
r
A* +----ID
G-
r
H
i i i
G+ p
m
! i i o i
sOH
~+
d
w
T(O4H)
A(N3H)
T(C6H) T(C5H)
A(C2H)
A(CSH)
,v
P1 P2
A(N6I-I) T(CH2)
C/T(C5OH) C/T(C6OH) A/G(C8OH) A(C2OH)
C(N3H) C(N4H) C(C5H) C(C6H)
G(O6H)
C5'(I-12)
G(C8H) G(CSH) G(,!T2H)
C3'(H)
G(N1)
88 C1' C3' C4' C5'
Figure 14: A model for radiation damage to D N A which includes damage to the bases, the sugar moiety, the phosphate group and the surrounding water molecules.
458
and anion radicals) on all bases (T, C, A, G), the phosphate group (P), the sugar moiety (S) and the surrounding water molecules (W). The transformation of each primary radical to secondary radicals is also displayed. The protonation of anions and deprotonation of cations are in strict competition with electron transfer throughout the DNA strand. The electron-transfer mechanisms are not shown in the diagram for simplification. Thus, the formation of secondary radical products is dependent on whether or not the anion is stabilized for a sufficient period of time to allow for protonation (or equivalently deprotonation of cations). Alternatively, hydrogen atoms or hydroxyl radicals can attack the undamaged bases to form the radical products included in the model. The model presented in Figure 14 indicates that a primary product could directly result in the formation of a secondary radical. For example, the thymine cation can deprotonate to form the methyl-dehydrogenated product. An alternative pathway could be that the, primary radicals react to form radical products on another center. For example, the cytosine cation was concluded not to deprotonate, but rather form a sugar radical (indicated by a horizontal line in the figure), which subsequently forms a sugar deprotonated radical. Another example is water cations form hydroxyl radicals that can abstract a hydrogen atom from the thymine methyl group or from deoxyribose. The protons formed from the water cations, in addition to the hydroxyl radicals, can add to any of the base anions to form protonated products (these processes are also indicated by horizontal lines in the figure). From the model developed in the present chapter and displayed in Figure 14, it can be seen that the possibilities of radical formation in irradiated DNA are extremely abundant. Since these are the most probable radical products in irradiated DNA, this model may be useful when attempting to characterize the ESR spectra of DNA. In order to narrow the formation of radical products further, more experimental work must be performed to rule out each product. For example, many experimental studies have shown that the formation of a specific radical cannot be eliminated solely due to the fact that its signal is not observed with ESR, since often a strong ENDOR signal will be obtained with the same sample. It is postulated that as experimental techniques become more advanced and are able to characterize more products, evidence will be obtained to support the current working model for radiation damage to DNA. 5. CONCLUDING REMARKS The discussion presented in the present chapter illustrates the diversity of-radical products generated in irradiated DNA samples. The knowledge of which
459
radicals are formed has important consequences for determining the type of damage exhibited (for example, strand-breaks, tandem lesions, DNA-protein cross-links, unaltered base release). The model outlined herein is much more complex than the original two-component model which speculated that initial radiation damage centers on the formation of only two ionic radicals. Moreover, early researchers have claimed on occasion that the "complexity of the DNA radical population" can be explained by the formation of four radicals [85]. From the multi-component model presented herein, it can be determined that this is clearly not true. The determination of the radicals generated upon irradiation of DNA leads to a broader area of research which can investigate how these radicals are formed or, more importantly, how they subsequently react to result in permanent dal~mge to the DNA strand. 6. A C K N O W L E D G E M E N T S
We thank the Natural Sciences and Engineering Research Council of Canada (NSERC), the Swedish Natural Science Research Council (NFR), and the Killam Trusts for financial support. REFERENCES
S. D. Wetmore, R. J. Boyd and L. A. Eriksson, J. Phys. Chem. B, 102 (1998) 5369. S. D. Wetmore, F. Himo, R. J. Boyd and L. A. Eriksson, J. Phys. Chern, B, 102 (1998) 7484.
,
,
o
o
S. D. Wetmore, R. J. Boyd and L~ A. Eriksson, J. Phys. Chem. B, 102 (1998) 9332. S. D. Wetmore, R. J. Boyd and L. A. Eriksson, J. Phys. Chem. B, 102 (1998) 10602. S. D. Wetmore, R. J. Boyd and L. A. Eriksson, J. Phys. Chem. B, 102 (1998) 7674.
6. A.D. Becke, J. Chem. Phys., 98 (1993) 1372. 7. C. Lee, W. Yang and R. G. Parr, Phys. Rev. B, 37 (1988) 785.
460
Q
R. Ditchfield, W. J. Hehre, and J. A. Pople, J. Chem. Phys., 54, 724 (1971); W. J. Hehre, R. Ditchfield and J. A. Pople, J. Chem. Phys., 56, 2257 (1972); P. C. Hariharan and J. A. Pople, Mol. Phys., 27 (1974) 209; M. S. Gordon, Chem. Phys. Lett., 76 (1980) 163; P. C. Hariharan and J. A. Pople, Theor. Chim. Acta, 28 (1973) 213; A. D. McLean and G. S. Chandler, J. Chem. Phys., 72 (1980) 5639; R. Krishnan, J. S. Binkley, R. Seeger and J. A. Pople, J. Chem. Phys., 72 (1980) 650; T. Clark, J. Chandrasekhar, G. W. Spitznagel and P. v. R. Schleyer, J. Comput. Chem., 4 (1983) 294; M. J. Frisch, J. A. Pople and J. S. Binkley, J. Chem. Phys., 80 (1984) 3265.
9. J.P. Perdew and Y. Wang, Phys. Rev. B, 33 (1986) 8800. 10. (a) J. P. Perdew, Phys. Rev. B, 33 (1986) 8822; (b) J. P. Perdew, Phys. Rev. B, 34 (1986) 7406. 11. Gaussian 94 (Revision B.2), M. J. Frisch, G. W. Trucks, H. B. Schlegel, P. M. W. Gill, B. G. Johnson, M. A. Robb, J. R.Cheeseman, T. A. Keith, G. A. Petersson, J. A. Montgomery, K. Raghavachari, M. A. A1-Laham, V. G. Zakrzewske, J. V. Ortiz, J. B. Foresman, J. Cioslowski, B. B. Stefanov, A. Nanayakkara, M. Challacombe, C. Y. Peng, P. Y. Ayala, W. Chen, M. W. Wong, J. L. Andres, E. S. Replogle, R. Gomperts, R. L. Martin, D. J. Fox, J. S. Binkley, D. J. Defrees, J. Baker, J. P. Stewart, M. Head-Gordon, C. Gonzalez, and J. A. Pople, Gaussian, Inc., Pittsburgh PA, 1995. 12. St-Amant, A.; Salahub, D. R.; Chem. Phys. Lett., 169 (1990) 387; St-Amant, A. PhD. thesis, Universit6 de Montr6al, 1991; Salahub, D. R.; Fournier, R.; Mlynarski, P.; Papai, I.; St-Amant, A.; Ushio, J. In Density Functional Methods in Chemistry; Labanowski, J., Andzelm, J., Eds.; Springer: New York, 1991. 13. L. A. Eriksson, Mol. Phys., 91 (1997) 827. 14. (a) V. G. Malkin, O. L. Malkina, L. A. Eriksson, D. R. Salahub, In Modem Density Functional Theory, A Tool for Chemistry; J. M. Seminario, P. Politzer, Eds.; Elsevier: New York, 1995; (b) B. Engels, L. A. Eriksson, S. Lunell, Adv. Quan. Chem., 1997, 27, 297; (c) L. A. Eriksson, In Encyclopedia of Computational Chemistry, P. v. R. Schleyer, Ed.; WHey and Sons: New York, 1998; (d) Eriksson, L. A.; Himo, F. Trends in Physical Chemistry 1997, 6, 153.
461
15. D. M. Close, Radiat. Res., 135 (1993) 1. 16. K. Miaskiewicz, J. Miller and R. Osman, Int. J. Radiat. Biol., 63 (1993) 677. 17. E. Sagstuen, E. O. Hole, W. H. Nelson and D. M. Close, J. Phys. Chem., 96 (1992) 1121. 18. L. Kar and W. A. Bernhard, Radiat. Res., 93 (1983) 232. 19. D. M. Close and W. H. Nelson, Radiat. Res., 117 (1989) 367. 20. W. H. Nelson, E. Sagstuen, E. O. Hole and D. M. Close, Radiat. Res., 149 (1998) 75. 21. E. O. Hole, W. H. Nelson, E. Sagstuen and D. M. Close, Radiat. Res., 129 (1992) 119. 22. A.-O. Colson and M. D. Sevilla, J. Phys. Chem., 100 (1996) 4420. 23. E. O. Hole, W. H. Nelson, D. M. Close and E. Sagsmen, J. Chem. Phys., 86 (1987) 5218. 24. E. O. Hole, E. Sagstuen, W. H. Nelson, and D. M. Close, Radiat. Res., 129 (1992) 1. 25. E. O. Hole, E. Sagstuen, W. H. Nelson and D. M. Close, J. Phys. Chem., 95 (1991) 1494. 26. W. H. Nelson, E. Sagstuen, E. O. Hole and D. M. Close, Radiat. Res., 131 (1992) 272. 27. D. M. Close, W. H. Nelson and E. Sagstuen, Radiat. Res., 112 (1987) 283. 28. E. Sagstuen, E. O. Hole, W. H. Nelson and D. M. Close, Radiat. Res., 116 (1988) 196. 29. W. H. Nelson, E. O. Hole, E. Sagstuen and D. M. Close, Int. J. Radiat. Biol., 54 (1988) 963.
462
30. E. O. Hole, E. Sagstuen, W. H. Nelson and D. M. Close, Radiat. Res., 125 (1991) 119. 31. F. Jolibois, J. Cadet, A. Grand, R. Subra, N. Rega and V. Barone, J. Am. Chem. Soc., 120 (1998) 1864.
32. E. Sagstuen, E. O. Hole, W. H. Nelson and D. M. Close, J. Phys. Chem., 96 (1992) 8269. 33. W. Hiraoka, M. Kuwabara, F. Sato, A. Matsuda, T. Ueda, Nucl. Acids Res., 18 (1990) 1217. 34. (a) D. Becker, T. La Vere and M. D. Sevilla, Radiat. Res., 140 (1994) 123; (b) D. Becker, T. La Vere and M. D. Sevilla, Radiat. Res., 145 (1996) 673. 35. M. Wala, E. Bothe, H. G6rner and D. Shulte-Frohlinde, J. Photocherru Photobiol. A, Chemistry, 53 (1990) 87. 36. (a) D. Chapman and C. GiUespie, J. Adv. Radiat. Biol., 9 (1981) 143; (b) R. T6oule, Int. J. Radiat. Biol., 51 (1987) 573. 37. S. Gregoli, M. Olast and A. Bertinchamps, Radiat. Res., 60 (1974) 388. 38. D. M. Close, E. Sagstuen, E. O. Hole, W. H. Nelson, J. Phys. Chem. B, 103 (1999) 3049. 39. S. D. Wetmore, R. J. Boyd, F. Himo and L. A. Eriksson, J. Phys. Chem. B, 103 (1999) 3051. 40. D. M. Close and W. A. Bernhard, J. C h e ~ Phys., 70 (1979) 210. 41. M. N. Schuchmann and C. von Sonntag, J. Chem. Soc., Perkin Trans., 2 (1977) 1958. 42. D. M. Close, Radiat. Res., 147 (1997) 663. 43. D. M. Close, W. H. Nelson, E. Sagstuen and E. O. Hole Radiat. Res., 137 (1994) 300. 44. K. Miaskiewicz and R. Osman, J. Am. Chem. Soc., 116 (1994) 232.
463
45. A.-O. Colson and M. D. Sevilla, J. Phys. Chem., 99 (1995) 3867. 46. W. Saenger, In Principles of Nucleic Acid Structure; Springer-Veflag: New York, 1984. 47. Effects of Ionizing Radiation on DNA; J. Htittermann, W. Kthnleif, R. Ttoule and A. J. Bertinchamps, Eds.; Springer: Heidelberg, 1978. 48. E. Sagstuen, J. Mag. Res. 1981, 44, 518. 49. E. O. Hole, W. H. Nelson, E. Sagstuen and D. M. Close, Radiat. Res., 130 (1992) 148. 50. C. Alexander, Jr. and C. E. Franklin, J. Chem. Phys., 54 (1971) 1909. 51. B. Rakvin and J. N. Herak, Radiat. Res., 88 (1981) 240. 52. E. Sagstuen, Radiat. Res., 84 (1980) 164. 53. E. O. Hole and E. Sagstuen, Radiat. Res., 109 (1987) 190. 54. A. Ehrenberg, L. Ehrenberg and G. Ltfroth, Nature, 200 (1963) 376. 55. R. Salovey, R. G. Shulman and W. M. Walsh, Jr. J. Chem. Phys., 39 (1963) 839. 56. P. S. Pershan, R. G. Shulman, B. J. Wyluda and J. Eisinger, Science, 148 (1964) 378. 57. A. Ehrenberg, A. Rupprecht and G. Strtm, Science, 157 (1967) 1317. 58. M. G. Ormerod, Int. J. Radiat. Biol., 9 (1965) 291. 59. A. Gr~islund, A. Ehrenberg, A. Rupprecht and G. Strtrn, Biochim. Biophys. Acta, 254 ( 1971) 172. 60. A. Gr~islund, A. Ehrenberg, A. Rupprecht, B. TjNldin and G. Strtm, Radiat. Res., 61 (1975) 488.
464
61. A. Gr/islund, A. Ehrenberg, A. Rupprecht, G. Str6m and H. Crespi, Int. J. Radiat. Biol., 28 (i 975) 313. 62. W. A. Bernhard, Adv. Radiat. Biol., 9 (1981) 199. 63. I. Zell, J. Htittermann, A. Gr~islund, A. Rupprecht and W. K6hnlein, Free Radical Res. Commun., 6 (1989) 105. 64. P. M. Cullis, J. D. McClymont, M. E. Malone, A. N. Mather, I. D. Podmore, M. C. Sweeney and M. C. R. Symons, J. Cherm Soc., Perkin Trans, 2 (1992) 1695. 65. W. A. Bernhard, J. Phys. Chem., 93 (1989) 2187. 66. J. Barnes, W. A. Bernhard and K. R. Mercer, Radiat. Res., 126 (1991) 104. 67. M. D. Sevilla, D. Becker, M. Yan and S. R. Summerfield, J. Phys. Chem., 95 (1991) 3409. 68. S. Steenken, J. P. Telo, H. M. Novais, and L. P. Candeias, J. Am, Chem. Soc., 114 (1992) 4701. 69. M. Yan, D. Becker, S. Summerfield, P. Renke and M. D. SeviUa, J. Phys. Chem,, 96 (1992) 1938. 70. S. G. Swarts, D. Becker, M. D. Sevilla, K. T. Wheeler, Radiat. Res., 145 (1996) 304. 71. (a) D. M. Close, E. Sagstuen, W. H. Nelson, J. Chem. Phys., 82 (1985) 4386; (b) E. O. Hole, W. H. Nelson, D. M. Close, E. Sagstuen, J. Chem. Phys., 86 (1987) 5218. 72. A.-O. Colson and M. D. Sevilla, Int. J. Radiat. Biol., 67 (1995) 627. 73. J. Htittermann, M. R6hrig and W. K6hnlein, Int. J. Radiat. Biol., 61 (1992) 299. 74. J. Htittermann, K. Voit, H. Oloff, W. K6hnlein, A. Gr/islund and A. Rupprecht, Faraday Discuss. Chem. Soc., 78 (1984) 135.
465
75. W. Wang, M. Yan, D. Becker and M. D. SeviUa, Radiat. Res., 135 (1994) 2. 76. W. Gatzweiler, J. Htittermann and A. Rupprecht, Radiat. Res., 138 (1994) 151. 77. B. Weiland, J. Htittermann and J. van Tol, Acta Chem. Scan., 51 (1997) 585. 78. B. Weiland and J. Htittermann, Int. J. Radiat. Biol., 74 (1998) 341. 79. I. D. Podmore, M. E. Malone, M. C. R. Symons, P. M. Cullis and B. G. Dalgarno, J. Chem. Soc. Faraday Trans., 2 (1991) 3647. 80. J. Htittermann, Ultramicroscopy, 10 (1982) 25. 81. D. Becker, Y. Razskazovskii, M. U. Callaghan and M. D. Sevilla, Radiat. Res., 146 (1996) 361. 82. A. Sanderud and E. Sagstuen, J. Chem. Soc. Faraday Trans., 91 (1996) 995. 83. D. J. Nelson, M. C. R. Symons and J. L. Wyatt, J. Chem. Soc. Faraday Trans., 89 (1993) 1955. 84. W. Saenger, Principles of Nucleic Acid Structure, C. R. Cantor, Ed.; Springer-Veflag: New York, 1984. 85. S. Gregoli, M. Olast and A. Bertinchamps, Radiat. Res., 89 (1982) 238. 86. W. Wang, D. Becker and M. D. Sevilla, Radiat. Res., 135 (1993) 146. 87. S. G. Swarts, M. D. Sevilla, D. Becker, C. J. Tokar and K. T. Wheeler, K. T. Radiat. Res., 129 (1992) 333. 88. T. Ito, S. C. Baker, C. D. Stickley, J. G. Peak and M. J. Peak, Int. J. Radiat. Biol., 63 (1993) 289. 89. N. Mroczka and W. A. Bernhrad, Radiat. Res., 135 (1993) 155. 90. J. Ohlmann and J. Htittermann, Int. J. Radiat. Biol., 63 (1993) 427. 91. C. von Sonntag and H.-P. Schuchmann, Int. J. Radiat. Biol., 49 (1986) 1.
466
92. S. Steenken, Chem. Rev., 89 (1989) 503. 93. J. Htittermann, M. Lange and J. Ohlmann, Radiat. Res., 131 (1992) 18. 94. (a) S. Gregoli, M. Olast and A. Bertinchamps, Radiat. Res., 65 (1976) 202; (b) S. Gregoli, M. Olast and A. Bertinchamps, Radiat. Res., 72 (1977) 201. 95. M. Malone, M. C. R. Symons and A. W. Parker, J. Chem. Soc. Perkin Trans., 2 (1993) 2067. 96. M. Lange, B. Wetland and J. Htittermann, Int. J. Radiat. Biol., 68 (1995) 475. 97. D. M. Close, E. Sagstuen and W. H. Nelson, Radiat. Res., 116 (1988) 379. 98. M. D. Sevilla and D. Becker, In A Specialists Periodical Report Electron Spin Resonance, Vol. 14, N. M. Atherton, M. J. Davis and B. C. Gilbert, Eds.; Royal Society of Chemistry: Cambridge, 1994, p. 130. 99. D. M. Close and W. A. Bernhard, Bull. Am. Phys. Soc., 25 (1980) 416. 100. E. Sagstuen, E. O. Hole, W. H. Nelson and D. M. Close, Radiat. Res., 149 (1998) 120. 101. S. Steenken, Free Radical Res. Commun., 16 (1992) 349. 102. E. Sagstuen, E. O. Hole, W. H. Nelson and D. M. Close, Radiat. Res., 146 (1996) 425. 103. M. Hutter and T. Clark, J. Am. Chem. Soc., 118 (1996) 7574. 104. E. O. Hole, W. H. Nelson, E. Sagstuen and D. M. Close, Radiat. Res., 149 (1998) 109. 105. E. Sagstuen, M. Lindgren and A. Lund, Radiat. Res., 128 (1991) 235. 106. S. Steenken and L. Goldbergerova, J. Am. Chem. Soc., 120 (1998) 3928.
L.A. Eriksson (Editor)
Theoretical Biochemistry- Processes and Properties of Biological Systems Theoretical and Computational Chemistry, Vol. 9 9 2001 Elsevier Science B.V. All rights reserved
467
Chapter 12
New Computational Strategies for the Quantum Mechanical Study of Biological Systems in Condensed Phases Carlo Adamo, Maurizio Cossi, Nadia Rega and Vincenzo Barone Laboratory for the Structure and Dynamics of Molecules (LSDM), Dipartimento di Chimica, Universit~ 'Federico II', via Mezzocannone 4, 1-80134 Napoli, Italia ABSTRACT This chapter examines some of the methodological and computational aspects involved in the modeling of biomolecular systems at a quantum-mechanical level. In the first part we analyze in some detail a general strategy allowing an effective study of phisico-chemical processes involving large molecules in condensed phases. The main building block of our approach is a modular electronic tool rooted in the density functional theory coupled to an effective description of environmental effects by a mixed discrete-continuum model. The potential energy surfaces obtained in this way provide the input for a numerical treatment of a small number of large amplitude motions (possibly involving light particles) coupled to an harmonic bath. In the second part of this contribution we discuss a number of prototypical applications with the aim of giving a flavor of the potentialities and of the upcoming developments of this integrated approach. The 'ill rouge' of our report is provided by open-shell systems, which represent at the same time key intermediates in a number of biochemical processes and particularly challenging systems for both experimental investigations and quantum mechanical computations.
1. INTRODUCTION The theoretical treatment of biomolecular systems is becoming increasingly important in modem science for at least two different reasons. From the one hand, theoretical studies allow to obtain information that cannot be easily accessed by experimental methods and to dissect an overall effect into different contributions simply switching different interactions on and off in a selective way. From the other hand, working hypotheses can be formulated that can stimulate fimlaer experimental work and reduce the number of different tests to be performed. Of course, these tasks can be fulfilled only if the accuracy and the reliability of theoretical results match the experimental standards. While conventional approaches have reached a remarkable accuracy for small and medium size systems, biologically interesting molecules are invariably large,
468
flexible, and do not act in vacuo, but in aqueous solution. Even if effective numerical simulations can be routinely performed by empirical energy calculations for chemically significant models, a number of problems (e.g. reactivity, proton and electron transfer, spectroscopic and photochemical processes) require a quantum mechanical approach. Thus theoretical and computational chemistry are presently facing the very demanding challenge of expanding the applicability of the quantum mechanical approaches to large molecules. Both hardware and software developments are contributing to this task, leading to the first applications of reliable electronic structure methods to macromolecular systems. A leading role in this progress is played by the use of fast mulfipole moment (FMM) methods, sparse matrix algorithms, and conjugate gradient density matrix search (CG-DMS) techniques for solving the serf-consistent field (SCF) problem. At the same time faster algorithms are being developed for geometry optimizations of large molecules and effective composite approaches fin-ther reduce computer times. In this context, the situation is particularly favourable for Kohn-Sham (KS) methods, although promising progresses are being done for post-HF models too. It seems, therefore, particularly important to examine the limits of current density functionals (DF) for the description of specific features of biological systems, like non covalent interactions, proton and electron transfer, or spectroscopic parameters. This should hopefully allow the developmem of new functionals with improved reliability in these fields. As a consequence the first section of this contribution is devoted to the work being performed in our laboratory in the framework of density functional theory. From another point of view biological molecules are often very flexible, so that a realistic computation of their properties cannot neglect vibrational averaging effects from large amplitude motions. This aspect is examined in the second part of this work. Finally, biological processes occur in solution so that the modeling of physico-chemical processes at a microscopic level must be extended from isolated molecules to condensed phases. While explicit inclusion of solvent molecules in numerical simulations is providing interesting results, only shorttime local fluctuations are presently amenable to routine computations, thus leaving aside fundamental phenomena like conformafional transitions or protein folding. At the same time, continuum approaches are becoming more and more effective and reliable, thanks to the increasing accuracy of the underlying model coupled to their remarkable flexibility and efficiency. Here we will concentrate on the so called polarizable continuum model (PCM), which, thanks to a number of recent improvements, is rapidly approaching the target of 'chemically accurate' computations for systems in condensed phases. As a matter of fact, nearly all the quantum mechanical procedures (including analytical gradients
469
and hessians) developed for isolated systems are now available (with comparable computational efficiency) also for .systems in solution. A brief sketch of the status of the PCM is given in the third part of the paper together with some illustrative results. Finally, the usefulness of the computational tools sketched in the first three sections is analyzed in the last part of the report by means of some case processes involving unstable intermediates (radicals and zwitterionic species) of biological significance. 2. THE DENSITY FUNCTIONAL MODEL.
In the last few years the Density Functional Theory (DFT) has become one of the most powerful tools in computational chemistry [1-3]. Actually, an increasing amount of studies deals with DFT, both in the field of basic theoretical developments and in the wide framework of chemical applications. There are several reasons for this success. First of all, methods rooted in the DFT take into account a significant amount of electron correlation, providing accurate numerical results. As a matter of fact, the latest DFT implementations show an accuracy comparable to that of many body perturbative methods [4]. Another major advantage of DFT is its favourable scaling with the size of the system under investigation. The Kohn-Sham (KS) approach [5], the most common route to DFT, rests on equations which are close to those developed for the Hartree-Fock ( H ~ theory [1]. It was therefore quite easy to implement this model in several commercial quantum-mechanical packages intoducing only slight modifications into already existing software. As a consequence, standard D F r approaches have reached nearly the same basis-set dependence as the HF method [6]. Furthermore, they can take advantage of the most recent implementations in the field of Self Consistent Field methods. For instance, algorithms like Fast-Multipole Methods (FMM) [7] or fast assembly of the Hamiltonian matrix [8] have been successfially applied to the DFT methods, essentially without any modification. So, the asymptotic linear scaling has been obtained, and sizeable systems (up to several hundreds of atoms) can be handled by this quantum mechanical tool. At the same time, the formally independent particle nature of DFT allows the application of standard interpretative tools developed for the HF approach. This is true not only for the standard Mulliken population analysis, but also for more sophisticated schemes, like the Natural Bond Orbital (NBO) analysis [9], the Atomic Polarizable Tensor population [10], or the Atom in Molecule (AIM) approach [11]. These tools allow the use of familiar and well known models to analyze the molecular wave function and to rationalize it in terms of classical chemical concepts. In short, DFT is providing very effective quantum
470
mechanical tools, which take into account most of the electron correlation, at a fraction of the computational cost required by conventional post-HF methods. However, the weakness of the DFT approach is represented by the non-classical part of the Hamiltonian, the so-called exchange-correlation contribution, which is an unknown functional of the electron density. A huge number of exchange and correlation functionals have been proposed, characterized by different physical soundness and numerical performances. In this context, the so-called hybrid HF/DFT models, which mix some HF exchange with DFT contributions, are nowadays considered as standards for their good performances. In particular, the popular B3LYP approach provides results close to the so-called chemical accuracy for the properties of systems involving covalent bonds (e.g. the thermochemistry of molecules belonging to the so called G2 data set) and also for some non-covalent interactions, like hydrogen bonds. However, hybrid methods, as well as convemtional DFT approaches, are not yet sufficiently accurate for a number of chemical problems, like van der Waals complexes, proton transfer, or SN2 reactions. These limits provide a strong driving force to the quest for new functionals. In our opinion, a major requirement for a succesfiall exchange-correlation functional is its generality: a "good" functional should treat with the same accuracy different chemical interactions and properties, avoiding any excessive 'specialization' for a specific subset of interactions or properties. In the next paragraph we will discuss in some detail this last point. Of course we cannot be exhaustive and we refer the reader to published reviews and textbooks for a more complete analysis [1-3]. 2.1 Ftmctionals of the electronic density In the Kohn-Sham (KS) approach to DFT [1,5], the total energy can be expressed as: E[p]=
Ts [P]+ Vext[P]+ J [ p ] + Exc [P]
(1)
Here, V~ [p ] is the potential energy in the field of the nuclei plus any external perturbation, T,[p] is the kinetic energy of a set of n independent electrons, moving in an effective one-electron potential which leads to the density p(r), and J [ p ] is the total Coulomb interaction [1]. E~[p] is the remainder, usually described as the exchange-correlation energy. This term represents the keyproblem in DFT, since the exact Exc is unknown, and approximations must be used. The simplest approach is the local spin density approximation (LSD), in which the functional for the uniform electron gas of density p is integrated over the whole space:
471
E LSD = E f
unif ( Per )Ptr ( r )4/ 3dr e xc
(2)
where exc _unif (Ptr) is the exchange-correlation energy per particle of a uniform electron gas and ~ represents the spin (t~ or ~). While this approximation is responsible for the early success of DFT, it often provides unsatisfactory results in chemical applications [3]. Starting from equation (2) several corrections for the non-uniformity of atomic and molecular densities have been proposed. In particular, those based on the gradient of the electron density (V9) have received considerable attention in the last years due to their simplicity. These corrections, collectively referred to as generalized gradient approximation (GGA), are usually expressed in terms of an enhancement factor over the exchange energy of the uniform electron gas, so that the total exchange energy takes the form:
EGGA = ELSD - E
f FGGA[ Ptr ' VPtr ]Ptr( r )4/3dr
(3)
Wl~e exc _unif (Pa) in equation (2) is uniquely defined, there is no unique function FGGA , and a number of GGA exchange-correlation functionals have been proposed (see for instance refs. 12-20). Roughly speaking, we can recognize two main classes" the first one collects the functionals containing parameters fitted to some sets of experimental data, while the second class includes funtionals which fulfill a number of theoretical physical constraints. Although most existing functionals combine both approaches, recently the attention is being shifted to the first aspect even at the expense of introducing a huge number of parameters and of overemphasizing the thermochemistry of organic molecules [21]. In contrast with this tendency, functionals belonging to the second class are particularly attractive to theoretical chemists, due to their strong theoretical background and to the absence of any "specialization". Furthermore, a number of recent studies are showing that some parameter free functionals are not less accurate than the most succesfull heavy-parametrized models [22-27]. Despite the theoretical difficulties involved in the development of such functionals, their number is increasing and there is a request for even more stringent theoretical constraints [28-32]. It is thus natural to focalize our attention on this class of exchange-correlation functionals and on their performances in the field of biological applications.
472
2.2 The PBE functional The non-empirical GGA functional of Perdew, Burke and Emzerhof (PBE) [28] can be considered as the most promising non-empirical functional. In particular, it was constructed to respect a number of physical constraints both in the correlation and in the exchange parts. A detailed discussion of the physical background of the PBE functional is given in references 33 and 34. Here we just recall that it obeys the following six conditions: 1) correct uniform gas-limit 2) correct spin and uniform density scaling of Ex 3) the correct upper bound Ex < 0 4) the correct upper bound E~ < 0 5) the correct Lieb-Oxford lower bond [35] 6) the LSD linear response. In this functional the correlation part is similar to the Perdew-Wang (PW) correlation functional [36], while the exchange contribution is"
FPBE
- 1 + x"-
K"
(4)
l+~s 2 K;
with
~'=0.804, kt----0.21951 and
s =[V~/~k,~l/ FP
. This form is not completely new,
because it is the same used by Becke in his 1986 paper [37], with the tr and kt parameters (0.967 and 0.235, respectively) determined by a fitting procedure. What is new, and makes the strength of the PBE approach, is that ~ and ~t in equation 4, as well as the other parameters in the GGA correlation functional, have been obtained imposing the above mentioned constraints [28]. These conditions determine the behavior of the functional and its numerical performance for different chemical "situations". For instance, we have recently evidenced that the correct asymptotic conditions are less important in high density (which corresponds to covalent bonds), than in low-density regions. This point is of particular importance, since these latter regions are responsible for non-covalent interactions, such as H-bonds, van der Waals (vdW) and charge transfer (CT) and for spin polarization effects in EPR observables [38]. It must be remarked that some of the conditions are fulfilled also by other current exchange-correlation functionals. For instance, the Becke 88 exchange functional [14] respects only three of the above conditions, as does the functional developed by Gill [ 16]. Table 1 collects an error statistics for several density functionals, concerning the atomization energies of 55 molecules belonging to the so caUed G2 set [39],
473
which is nowadays considered a standard for the validation of new quanturn chemical approaches [40].
Table 1. Mean absolute errors (mae's, kJ/mol) and maximum errors for atomization energies of the original G2 set (55 molecules). The values have been computed using the MP2/6-31G(d) geometries of reference 39 and the 6-31 l+G(2df,2pd) basis set. Method GGA functionals BLYP PBE BPBE RevPBE RPBE Hybrid functionals B3LYP PBE0
mae
max error
40.2 36.0 27.2 20.1 18.8
107.9 (CO2) 110.4 (CO2) 98.7 (CO2) -82.8 (Si2H6) -86.6 (Si2H6)
10.0 14.6
-34.3 (Bell) -42.7 (Si2H6)
14.2
56.1 (02)
r-functionals mGGA
These results give a flavor of the performances of the different functionals with respect to this set of covalently bonded molecules, and can be considered as a starting point for a deeper discussion about chemical applications. From these data, it is quite apparent that the PBE functional performs as well as more empirical DFF approaches, like the BLYP model (Becke 88 exhange [14] and Lee-Yang-Parr correlation [19]). In table 2 we report the deviations for the geometrical parameters and harmonic vibrational frequencies of 32 molecules belonging to the G2 set. Here, the deviations of PBE are close to those provided by the BLYP functional, thus giving further support to the reliability of this model. It is clear, anyway, that these results are still far from the accuracy required for chemical applications (e.g. about 5 kJ/mol for atomization energies). Furthermore, the PBE functional suffers from other problems. For instance, the energy barriers for proton transfer reactions [22], as well as some chemisorpfion energies [31] are still significantly underestimated.
474
Table 2. Mean absolute errors (mae's) and maximum error for the bond lengths and harmonic vibrational frequencies of the molecules belonging to the reduced G2 data set. All values have been computed using the 6-311G(d,p) basis set. d (A) v (cm-1) mae
GGA functionals BLYP PBE revPBE RPBE
nlax err.
mae
max
err.
0.013 0.011 0.012 0.013
0.075(Li2) 0.064 (Li2) 0.072 (LIE) 0.075 (Li2)
77 59 65 72
212(CH) 194 (H2CO, 2192) 215 (NH) 195 (H2CO, 2b2)
0.004 0.007
0.057 (Li2) 0.062 (Li2)
32 40
135 (NO) 144 (NO)
0.019
0.111 (Li2)
72
196 (CH,NH)
Hybrid functionals B3LYP PBE0 z-functionals
mGGA
2.3 Beyond the PBE functional In order to simplify the following discussion, we separate correlation and exchange contributions, i.e.
Exc = e x + ec
(5)
Even if this is the most common representation of Ex~, the distinction between different contributions is somewhat artificial in the context of DFr. Ftn'thermore it can be misleading because there is some error compensation between both partners and combination of exchange and correlation functionals issuing from different sources could be dangerous. Anyway the separation between these two terms considerably simplifies the discussion and in the following we adopt this distinction. Some efforts have been done to improve the numerical performances of the PBE exchange functional without modifying its theoretical background. Among all the PBE conditions, the fulfillment of the Lieb-Oxford (LO) bound [35], E x > -1.679p(r)4/3
(6)
475
rises some question [29,31]. Figure 1 shows the behavior of some exchange functionals with respect to this bound. In particular, the Becke 86 (B86, ref. 12), PBE, RevPBE and RPBE exchange functionals have been considered. The first three functionals correspond to different values of the g and ~: constants in equation 4, whereas the last functional has a different dependence on p and V p. In the construction of the revPBE functional [29], Zhang and Yang pointed out that, for a given electron density, fulfillment of equation (6), which may be considered a local LO limit, is a sufficient, but not a necessary requirement for the fulfillment of the true, integrated LO bound:
E x > Exc >-1.679Axf p( r )4 / 3dr
(7)
In particular, the choice ~=-1.245 leads to both the fulfillment of the LO bound in all the chemically significant situations, and to more accurate energies for atoms and covalent molecules [29]. However, employing the local bound in the GGA construction ensures that the integrated bound will be satisfied for any possible electron density. Furthermore, optimization of the parameters for a specific property may worsen the behaviour of the functional for other situations. As a consequence, the revPBE functional, while being based on interesting considerations, loses its generality (see discussion in reference 41, and below). Recently, Hammer, Hansen and Norskov observed that the revPBE functional differs from the original PBE exchange in the region s < 2.5, where it still fulfiUs the local LO bound [31] (see figure 1). This suggests that it should be possible to construct from the PBE a new GGA functional, which follows the revPBE exchange only for s values up to 2.5. The resulting functional form is:
I -EszI
FxRPBE =1 + tr 1 - e ~:
(8)
They called this functional RPBE. So, while the revPBE functional deviates form the PBE functional in the value of one parameter (~:) in the exchange enhancement factor Fx(s), the RPBE functional deviates from the PBE functional in the form of the functional itself. It must be pointed out that RPBE preserves all the correct features of the parent PBE model. This functional provides very good chemisorption energies, but has not yet been tested on molecular systems. The behavior of the revPBE and RPBE functionals, with to respect the LO limit is shown in figure 1.
476
2,5 "
revPBE 2,0 Lieb-Oxford limit
..................................... R'P B-E ........
1,5
1,0
0
2
4
6
8
10
Figure 1. Asymptotic behavior of some exchange functionals belonging to the PBE family It is noteworthy that the considered functionals (B86, PBE, revPBE and PBE) have different behaviors in the region l<s < 3, which is the most important region in real systems [30]. This is well evidenced by the tests on the G2 set (see table 1). In fact, a significant improvement is found in the computations of the atomization energies: the mae is 20.1 kJ/mol for the revPBE and 18.8 kJ/mol for the RPBE approach (see table 1). In a similar manner, also the error on the geometries slightly decreases, being 0.010/k for both functionals (see table 2).
2.4 A further improvement: the hybrid I-IF/DF methods Even if variations of the functional form of the PBE exchange induce some improvements in its numerical accuracy, we believe that a better avenue toward increased reliability is provided by hybrid HF/DFT models [42]. In this context, the three-parameter model of Becke is the most successful approach. This model rests on a linear combination of HF exchange with DF exchangecorrelation contributions: xc -_ Ehybrid
axo
ELSD + (1 - axo ) E H F + axl zkE GGA
+
EcLSD
+
a c A E GGA
(9)
where L~x c~A and LkErc'Ca are the GGA contributions to exchange and correlation, while ExmD and EcLs~ are their LSD counterparts. The three semiempirical parameters, axo, axl and ar have been determined by fitting the heats of formation of a standard set of molecules. The most popular implementation is the so-called B3LYP method, which uses (in a self-consistent
477
way) the Becke 88 exchange functional [12] together with the Lee-Yang-Parr correlation functional [19] :
EBxc3LYP = axOELSD + ( 1 - a x o )E HF + axlAE B + EcVWN + acAE LYP
(10)
where the LSD contribution to the exchange energy is that of a uniform spinpolarized electron gas and the local correlation component is represented by the Vosko-Wilk-Nuisar parametrization [43]. Since the local part of the LYP correlation functional is not too different from the correlation energy of the uniform electron gas, the last term is usually replaced by:
aEcLrP = EcLrP _ EcVWN
(11)
On the basis of such an experience, Becke has recently proposed a singleparameter version [44]:
EAS M = ao EHF + ( 1 - ao)(E LSD + AE GGA) + E LSD + A E y GA = (12)
a 0 (E HF _ EGGA)+ EGGA where a0 ranges between 0.28 and 0.16, depending on the choice of the GGA correlation functional. This parameter is, once again, chosen to obtain the best fitting to the experimental data of a standard set of molecules. But is it possible to give a more sound physical foundation to the hybrid methods? The adiabatic connection formula [45] provides a solid ground. This formula is usually expressed in the form:
Exc = f Exc,AdX
(13)
with
Exc,Z =(~Z
IVee[ eZ )
e2
f f p(r)p!r')drd Ir - rl r,
(14)
here ~, is an electronic coupling strength parameter that switches on the Coulomb repulsion between electrons and Ex~,n is the corresponding potential energy of
478
exchange and correlation for electron-electron interactions at intermediate coupling strength ~,. The integrand of equation (13) refers explicitly to the potential energy only, the kinetic part of the exchange correlation energy being generated by the X integration. This formula connects the noninteracting KS reference system (X=0) to the fully interacting real system (X=I), through a continuum of partially interacting systems (0
Exc =EGGA+~I(EHF_EGGA)
(15)
We have recently shown that the numerical performances of some of these models are comparable to those of current 3-parameter hybrids like B3LYP [47,48]. In particular, we have obtained the PBE0 model, casting the PBE functional in equation (15) [49] . This model provides very good results, both for the termochemistry of molecules belonging to the G2 set (see table 1) and for the corresponding geometric parameters (see table 2). It must be pointed out, anyway, that these results give only some indications about the performances of the different DFT protocols and they are not conclusive. In fact, thermochemistry, geometries and harmonic frequencies of the molecules included in the G2 set are relatively well described by all the hybrid methods. There are, however, well known molecular systems (e.g. van der Waals complexes) which are much more demanding. To this end, we have chosen a prototypical system, namely the He dimer, which is representative of very weak van der Waals interactions. It is well known that this system is very difficult to handle in the framework of the DFT approach. In particular, the most common DFT methods, including some hybrid HF/DF approaches, significantly underestimate the interaction strength in such complexes, whereas some approaches, like those using the Perdew and Wang functional; significantly overestimate it [50,51 ]. Figure 2 compares the energy profiles obtained by different functionals with the accurate analytical expression of ref.[52].
479
0.006 0.004
\
0.002
0 w
0.000 -0.002 -0.004
_.__.___r v
2.4
I
2.6
w
!
2.8
w
RPBE I
3.0
~
I
'
3.2
.
I
. "
3.4
.'
.
I
.
3.6
'.
I
3.8
'
i
4.0
d(He-He), A
Figure 2. Potential-energy curves for the interaction of two He atoms. The curve labeled "exact" has beeen computed using the analytical expression of reference [52] All the curves are corrected for the BSSE effects [53]. From these plots it is quite apparent that both the PBE and PBE0 models predict interaction energies and equilibrium distances which are sufficently close to the experimental values [52]. In particular, the PBE0 functional gives an interatomic distance for the He dirner (2.78 /k) which is slightly lower than the experimental value (2.97 ~,), while the interaction energy is slightly overestimated (0.002 vs. 0.001 eV). It is interesting to note that all the other functionals provide too deep minima. We have next obtained remarkably accurate results for a large number of molecular properties, including EPR and NMR spectroscopic parameters, polarizabilities and vertical excitations energies [24,25,54-56].
2.5 Beyond the GGA functionals All these efforts point out how the GGA form is severely limited, and in practice some compromise must be done between molecular applications, which require more nonlocal fimctionals [57], and solid-state uses, for which local functionals always provide good results [58]. One general way to go beyond the GGA is to construct a fully nonlocal density functional. This goal is quite ambitious and trials might lead to functionals useless for practical purpose [59]. A more practical way is to construct a functional casting additional semilocal information. This can be done, for instance, including the kinetic energy density of the occupied KS orbitals:
480
occup
1 ZIV~io. Vcr (r) = -~
(r)t2
(16)
i
so that
EGfA = ExLScD- Z f FxcGGA[Qtr'VPtr "t:]Ptr (r)4/3 dr
(17)
(7
It is interesting to note that early x-functionals preceded most GGA' s[60,61]. These x-functionals (sometimes referred to as Meta-GGA's [62]), seem to be quite promising, as demonstrated by recent constructions [63-65] based upon fits to chemical data. They have also some important properties. In particular, the PBE functional, as well as other GGAs, is not self-interaction free, that is the correlation energy does not vanish for a one-electron density. In this sense, nonlocal functionals, using the kinetic energy density, are Serf Interaction Correlation (SIC) free by construction. Very recently Perdew and co-workers proposed a ~-functional developed in the same spirit as the original PBE model [62]. Furthermore new physical constraints have been included while keeping those already build into the PBE functional. One of the new features is the correct second- and fourth-order term of the gradient expansion for the exchange part [66]. Unfortunately this improved behavior requires the introduction of two new adjustable parameters. It is interesting to remark that the functional forms of this exchange and correlation functional (hereafter referred to only as mGGA) are similar to those of the parent PBE. In fact the enhancement factor of the exchange is
FxmGGA =
1 + #r - ~K" x I+-K"
(18)
where 10 146 q2 73 --~qp+ x=~p+~ 81 2025 405 with
[ 1(1o D+-K;
(19)
481
p
lvpl 2 =
4(3n:2)2/3p5/3
;
q=
2(3nr2)2/3p5/3
9
p
20
12
(2o)
As usual, the parameter D is estimated by minimizing the mean absolute error in the atomization energies of a standard set of molecules (D=0.113). The correlation counterpart is nothing else the PBE correlation functional with an additional contribution, which leads to a vanishing correlation for oneelectron systems.
E =Id3r pE~eee
1+C[(~~~/2-(l+C)~(~)tgaEe~ ee
(21)
Here
"cW(r ) -- -l lvp I 2 8 Pa
(22)
is the Weizsacker kinetic energy density, which is, of course, exact for oneelectron systems. Thus equation (21) vanishes for any one-electron density. The parameter C is chosen in order to reproduce the surface correlation energies for jellium obtained by the PBE functional (C=0.53). As shown in table 1 and 2 the mGGA functional performs remarkably well for atomization energies [53], but molecular geometries and frequencies are significantly worse than those provided by the PBE functional [67]. 2.6 Some tests
Before proceeding to describe in detail specific applications, we wish to discuss some general trends with the aim of giving a flavor of the performances of different functionals and to stress their limits in practical chemical applications. Together with the PBE and the mGGA exchange-correlation functionals, we have considered also the PBE0 hybrid model and the GGA functional obtained adding the RPBE exchange to the original PBE correlation part. Since the revPBE functional does not respect all the PBE constraints and provides results which are close to those of the RPBE functional, we will no longer consider it in the following. All the DFT computations have been computed using the development version of the Gaussian package [68]
482
2.6.1 EPR hyperfine coupling constants. Free radicals are generally short-lived, highly reactive species, usually characterized experimentally by their magnetic properties only. Thus a successful theoretical approach must be able to provide at the same time reliable structural and magnetic properties. Here we have chosen as representative models the methyl, allyl and formaldehyde cation radicals. The isotropic hyperfine coupling constant (hcc) of a magnetically active nucleus N (a(N)) is related to the spin densities at the nucleus by [69] a ( N ) = 81r
-~ gefle g N flN Z Pff,vfl (tp# ItS(rkN ~ tpv)
(23)
].t,l,, where ~ , I3N are the electron and nuclear magneton, respectively, g e , gN are the corresponding magnetogyric ratios, h the Planck constant, ~(r) is a Dirac delta operator and pa-p is the difference between the density matrices for electrons with t~ and [3 spin. In the present work, all the values are given in Gauss (1G = 0.1 mT), assuming that the free electron g value is appropriate also for the radicals. To convert data to MHz, one has to multiply them by 2.8025. The methyl radical is well characterized by both experimental and theoretical points of view [e.g. 70-72] thus providing the most natural benchmark for the study of n-radicals. Some results are shown in table 3. Table 3. CH bond length (,~) and isotropic hcc's (G) computed for the methyl radical by different methods. All the computations have been performed using the EPR-III basis set. parameter CCSD[T]~ B3LYP PBE RPBE mGGA PBE0 expb CH 1.079 1.079 1.085 1.088 1.091 1.080 1.079 a(C) 27.8 29.8 24.6 28.0 27.3 28.6 28.4 a(H) -24.6 -23.3 -23.4 -24.7 -29.2 -26.2 -25.1 a) ref. 70; b) ref. 71. It is quite apparent that the PBE0 approach provides the most accurate results, even with respect to the B3LYP model. Note, furthermore, that conventional density functionals provide comparable hyperfme splittings for the hydrogen atom, but disappointing results for carbon [73]. The allyl radical is of particular theoretical interest as a small molecule which exhibits the phenomenon of doublet instability, or symmetry breaking. As a consequence, the restricted open-shell HF (ROH~ method fails to reproduce the C2v equilibrium structure predicted by experimental studies [74]. One must, therefore, resort to unrestricted (UHF or UKS) or multiconfigurational (MCSCF) methods. The results obtained using different functionals are reported in table 4. As for the methyl radical, a good agreement is found between the
483
geometric parameters obtained at the MCSCF level and those delivered by the PBE0 model. The largest difference is found for the c a c ~ bond, whose PBE0 length (1.379 /k) is slightly lower than both MCSCF (1.388 /k) [75] and experimental (1.387/k) [76] values.
Table 4. Geometric parameters (,~ and degrees) and isotropic hcc's (G) computed for the allyl radical by different methods Parameter MP2a CASa'b B3LYP PBE R P B E m G G A PBE0 expr C~Ca 1.377 1.388 1 . 3 8 1 1 . 3 8 7 1.392 1.396 1.379 1.387 Calla 1.088 1.087 1 . 0 8 5 1 . 0 9 4 1.096 1.100 1.087 1.087 C~H~1 1.084 1.084 1 . 0 8 2 1 . 0 9 1 1.092 1.096 1.084 1.085 C~H~2 1.082 1.082 1 . 0 8 0 1 . 0 8 8 1.090 1 . 0 9 4 1.082 1.082 C~CaHa 1 1 7 . 8 117.5 1 1 7 . 5 1 1 7 . 5 117.7 1 1 7 . 6 1 1 7 . 6 118.0 CaC~H~1 1 2 1 . 0 121.2 1 2 1 . 0 1 2 1 . 0 121.3 120.8 121.0 121.2 CaC~H~2 1 2 1 . 4 121.4 1 2 1 . 4 1 2 1 . 5 121.2 121.5 121.4 121.5 a(Ha) 22.0 1.1 4.4 3.0 3.6 4.7 5.6 4.2 a(H~1) -33.2 - 9 . 6 -15.5 -14.5 - 1 5 . 5 -18.8 -17.3 -13.9 a(H~2) -34.6 -9.1 -14.6 -13.6 - 1 4 . 6 -17.7 -16.3 -14.8 a(Ca) -53.4 -17.2 -16.2 -13.4 - 1 4 . 3 -14.9 -18.5 -17.2 a(C~) 50.3 35.4 18.5 14.2 16.1 16.5 19.2 21.9 a) D95 basis set; b) ref. 75; c) ref. 76. The remarkable accuracy of the PBE0 geometry is reflected in the computed hcc's, which are close to their experimental counterparts. This is particularly true for the C hyperfine constants, while a slight overestimation of absolute values is found for hydrogen atoms. As a last open shell system, we have chosen the H2CO § radical, which has been well characterized both at the theoretical and experimental level [77-80]. In this radical the symmetry of the singly occupied molecular orbital (a n-in plane orbital located mainly on the oxygen center) [79] determines that only spin polarization effects contribute to isotropic hcc's of C and O, whereas spin densities at hydrogens have also a direct delocalization contribution. The results of table 5 show that, as expected, isotropic hcc's at the H atoms are well reproduced by all the functionals, whereas the results for heavy atoms are much more scattered. In particular, the hyperfine constant of the carbon atom is -35 G at the PBE0 level, in better agreement with the experimental value (-39 G)[80] than the B3LYP prediction (-34 G). It is noteworthy that the PBE0 functional provides values for the H and C atoms which are sufficiently accurate. Unfortunately, no experimental value is available for the oxygen atom.
484
Table 5. Geometrical parameters (A, and degrees) and isotropic hcc's (G) computed for the H2CO+ radical by different methods parameter CCSDa CISDb B3LYP PBE R P B E m G G A PBE0 expc CO 1.201 1 . 2 1 0 1 . 1 8 8 1.191 1.198 1 . 2 0 2 1.186 CH 1.115 1 . 1 0 0 1 . 1 1 9 1 . 1 3 4 1.133 1 . 1 3 4 1.119 HCO 1 1 9 . 4 1 2 4 . 0 1 2 0 . 0 1 2 1 . 0 120.80 1 2 0 . 4 119.8 a(H) 107.2 86.2 1 3 0 . 3 137.8 133.5 1 2 9 . 3 134.2 133 a(C) -38.8 - 2 4 . 7 -33.5 -30.2 - 2 9 . 8 -28.5 -34.6 -39 a(O) -20.3 - 1 2 . 9 -15.4 -7.1 -7.7 -8.3 -14.5 a) 6-311G(d,p) basis set, ref. 81; b) [8,4,1/6,1,1] basis set, ref. 77; c) ref. 80. All these tests show that conventional GGA functionals, like PBE and RPBE, and even x-dependent functionals like mGGA, do not reach the numerical accuracy provided by the PBE0 model. As a consequence, only the results provided by this functional will be considered in the following together with the results obtained with the parent pure functional, PBE, for the sake of completeness.
2.6.2 NMR absolute shieldings A number of approaches have been proposed for the computation of NMR properties in the framework of DF methods [82-86]. Here we will make explicit reference to the GIAO model, which appears particularly effective [82,84]. It has been recently pointed out that the analogous GIAO/MP2 method outperforms all the current DF approaches, including the B3LYP method [82]. Furthermore, the inclusion of exact exchange has a small effect for the first-row nuclei, whereas different GGA's provide markedly different results [82]. In contrast, ACM3 methods significantly outperform their underlying GGA's in the prediction of NMR chemical shifts of transition metal compounds [85]. To have a better understanding of PBE and PBE0 numerical performances, we have chosen a quite large set of molecules, which are characterized by different hybridizations and chemical environments for the nuclei of interest. In table 6 the absolute shielding constants for several molecules, obtained at the PBE and PBE0 level of theory, are reported. This first set has already been taken as a benchmark in a previous NMR study [82]. Some statistical parameters, including overall absolute mean deviation, max deviation and mean deviation for 13C and 15N shieldings, are also included in the same table. Let us first analyze the results for ~3C, in terms of the different hybridizations of the carbon atom. From the data reported in the table, it is apparent that both the PBE and PBE0 methods generally provide reliable results for chemical shieldings of sp 3 carbons. Furthermore the PBE0 results are closer to the experimental values than both MP2 and B3LYP methods. For instance the PBE0 value for the carbon atoms in ethane is 194 ppm, whereas the PBE value is 191
485
ppm, both values being close to the experimental finding (195 ppm). The corresponding MP2 and B3LYP values are 174 and 188 ppm, respectively. The case of CF4 is also remarkable since the low shielding constant predicted by the PBE and B3LYP models (44 and 47 ppm, respectively) is significantly improved at the PBE0 level, leading to a value much closer to the experimental determination (59 vs. 65 ppm). A dramatic variation is obtained in going to sp 2 carbon atoms. In particular both MP2 and B3LYP methods fail in predicting the shielding constants, providing values which are either excessively shielded (MP2) or too deshielded (B3LYP). In contrast, both PBE and PBE0 approaches give results close to the experimental values, with the latter approach performing slightly better. Acetone provides a significant example, the ~ of the carboxylic carbon being -23 ppm at the MP2 level and only -6 ppm at the B3LYP level. In contrast, the PBE0 value (-11 ppm) is close to the experimental value of-13 ppm. Finally, the chemical shieldings of sp carbon atoms are well reproduced by PBE and PBE0 models, both methods giving values close to MP2 results and to experiments. In contrast the B3LYP approach drastically underestimates all the shieldings. Here the most interesting case is represented by carbon oxide (CO): all the DF approaches give too deshielded results, whereas MP2 values are too shielded. Anyway, the PBE0 model provides the best agreement with the experimental values among all the theoretical methods. As a general remark, we note that the shieldings of the sp a and sp carbon atoms seem to be rather insensitive to the inclusion of the HF exchange, but strongly dependent on the underlying GGA functional. In contrast, the shieldings of the sp 2 carbon atoms are significantly improved when going from PBE to PBE0 models, i.e. including some HF exchange. In summary, the PBE0 absolute shieldings for the carbon atoms are of remarkable quality for all the hybridizations and chemical enviroments considered here. In particular the mean absolute deviation for the PBE0 model (4.6 ppm) is lower than that provided by the MP2 approach (6.0 ppm) and by the HF method (7.9 ppm). So, the PBE0 functional represents an improvement over conventional quantum mechanical approaches, whereas the B3LYP model (and all the other current functionals) remains essentially at the same level as the HF method [82]. Our set of molecules allows also some analysis of 17N absolute shieldings. In particular, it is remarkable that already the PBE functional gives results closer to experiment than the B3LYP values, and a further improvement is obtained when going to the PBE0 model. In contrast, an excessive shielding is obtained by the MP2 method and just the opposite occurs at the B3LYP level.
486
Table 6. Absolute isotropic shielding constants (~, ppm) computed at various theoretical levels for 13C, 15N and 170. Density functional values are computed using the 6-311+G(2d, p) basis set and 6-311G(d,p) geometries. Nuclei are labelled from left to fight. Molecule CH4 C2H6 C2I-I4 C2H2 CH2CCH2 C6I-I6 HCN CH3NH2 CH3CN
N2 CH3OH CO2 CH20 CH3COCH3
CO CH3F CF4
H20 NH3 mean abs. dev. max dev. m.a.d.13C m.a.d. 15N
Nucleus C C C C C C C C N C N C C N N C O C O C O C C O C O C C O N
H~ 195.7 184.0 59.9 113.9 114.0 -44.3 55.0 68.1 -56.0 163.8 250.0 190.9 60.6 -46.6 -128.7 143.7 274.7 47.8 214.8 66.9 -461.2 163.5 -23.2 -340.5 -29.2 -95.0 124.5 79.2 326.9 262.6
MP2 a 201.5 188.0 71.2 123.3 120.9 -26.0 64.0 87.3 1.0 164.9 261.2 193.6 76.1 -13.2 -44.9 142.2 350.6 63.5 241.0 6.7 -341.9 148.8 -23.2 -340.5 11.1 -47.4 121.8 64.4 344.8 276.2
B3LYP a 189.6 173.6 48.7 106.3 104.5 -51.7 45.2 67.2 -53.1 150.1 238.4 180.4 57.4 -40.7 -105.4 127.4 321.6 46.9 206.9 -25.4 -469.8 164.5 -5.8 -279.8 -21.7 -87.8 106.6 46.5 325.7 260.3
PBE 190.6 174.0 54.5 111.6 108.6 -38.9 51.4 75.6 -36.7 144.9 234.0 182.2 65.8 -26.3 -78.5 128.3 320.3 55.8 207.8 - 18.4 -429.3 151.3 -27.3 -337.7 -8.5 -76.8 107.4 43.9 322.2 258.7
PBE0 194.0 179.7 58.4 114.0 112.5 -36.6 55.3 76.6 -34.9 157.1 244.0 187.7 68.2 -24.4 -76.8 136.5 334.7 56.8 220.0 - 11.1 -422.2 157.0 - 11.1 -330.2 -7.8 -70.0 116.5 59.2 328.9 263.1
20.3 67.1 7.9 41.7
8.4 35.6 6.0 15.9
18.4 47.9 7.5 24.1
12.2 35.6 8.6 13.9
7.0 27.7 4.6 11.8
exp b 195.1 180.9 64.5 117.2 115.2 -28.9 57.2 82.1 -20.4 158.3 187.7 73.8 -8.1 -61.6
58.5 243.4
158.0 - 13.1 1.0 -42.3 116.8 64.5 344.0 264.5
a) QZ2P basis set, ref. 21; b) experimental data are taken from reference 82. T h e largest difference is o b s e r v e d for the nitrogen molecule, w h o s e XSN ~ is -78 and -77 p p m at the P B E and P B E 0 levels, respectively, in g o o d a g r e e m e n t
487
with the experimental value (-62 ppm). In contrast, the B3LYP and MP2 approaches predict - 105 and -45 ppm, respectively. Experimental 170 absolute shieldings are available for only two of the molecules reported in table 6, namely CO2 and CO. These cy's are badly reproduced by the B3LYP approach, while realiable results are obtained at the MP2 level [82]. Already the PBE model represents a significant improvement over the B3LYP method, and even better results are obtained at the PBE0 level. Unfortunately, it must be admitted that even these last results remain far from those provided by the MP2 method. We have analyzed also a number of NMR shielding constants for hydrogen atoms and found that both PBE and PBE0 models give results which are close to the experiments, the mean absolute deviation being 0.6 ppm for both methods. Furthermore, the B3LYP method provide a similar deviation (0.7 ppm). These findings are reminiscent of the situation found for EPR parameters, where good results are obtained for hydrogen atoms by several functionals, whereas only a few models provide reliable results for other atoms. 2.6.3 General comments
In conclusion structural, thermodynamic and vibrational parameters obtained by different hybrid functionals are quite similar and compare favourably with experimental data [87]. This is also the case for EPR and NMR couplings of hydrogen atoms and for dipole moments. Since in the following we will concentrate on properties for which the behavior of hybrid functionals is comparable, we prefer to report in most cases B3LYP results, which can be reproduced by standard computer codes and compared to a larger number of previous results. It is, however, remarkable that whenever the long range behavior of the functional plays a role (e.g. van der Waals complexes, non linear optical properties or excitation energies for Rydberg states) the results delivered by different PBE variants outperform those obtained by previous functionals (B3PW91, B3LYP, etc.). Also for EPR and NMR couplings of heteroatoms the PBE0 results are generally better than those delivered by other fimctionals involving extensive ad hoc parametrization, being acmaUy competitive with the results of low-order perturbative post-HF techniques for well behaved systems, while being significantly more accurate in the presence of huge correlation effects. It must be pointed out that also other chemical properties of particular interest for biological systems are well modelled by the B3LYP and PBE0 approaches. For instance, H-bond strengths and polarizabilities are reproduced by these methods with an accuracy comparable to that obtained with low order perturbation techniques [87,88]~
488
3. VIBRATIONAL A V E R A G I N G The simplest way to combine electronic structure calculations with nuclear dynamics is to use harmonic analysis to estimate both vibrational averaging effects on physico-chemical observables and reaction rates in terms of conventional transition state theory, possibly extended to incorporate tunneling corrections. This requires, at least, the knowledge of the structures, energetics, and harmonic force fields of the relevant stationary points (i.e. energy minima and first order saddle points connecting pairs of minima). Small amplitude vibrations around stationary points are expressed in terms of normal modes Q, which are linearly related to cartesian coordinates x Q = L+M 1/ 2Ax
(24)
where, by convention, all the components of Q vanish at the reference configuration (x0) and Ax = x-x0. The matrix L can be obtained by the following eigenvalue equation: A = L+M -1 /
2Fx(x 0 ) n -1 / 2L
(25)
where Fx(x0) is the matrix of Cartesian second derivatives evaluated at the reference geometry and Aij = ~Sij. Then the development of conceptually transparent and numerically feasible models for the study of anharmonic vibrations and chemical reactions in large systems can be based on a systematic reduction strategy consisting of: (I) limitation to that part of the nuclear configuration space, which is energetically relevant, i.e. to the harmonic valley surrounding stationary points and, possibly, a single effective large amplitude path (LAP) connecting them and (2) restriction to that subspace which is dynamically relevant, i.e. to the active system consisting of the path tangent and few other motions (referred to as dynamical) coupled to it (dynamical reduction), the other degree of freedom (referred to as statistic) being considered adiabatic or even neglected. The large amplitude path along with the corresponding large amplitude Hamiltonian (LAH) provides an invaluable reference framework for the analysis of nuclear dynamics effects in terms of more or less sophisticated dynamical treatments. While it is, of course, not possible to discuss these aspects in the present contribution, we thought it interesting to give a sketch of some basic developments with specific reference to the vibrational modulation of observables by low-frequency inversion or torsion motions.
489
The first order approximation to the vibrationally averaged value of an observable O in the vibrational state identified by the array of quantum numbers n is O = O 0 + AO 0 +
Z a i ( ni +--ff z.,1)+ Z B i j ( ni +-ff .,.,11 + 2 i
(26)
ij
where 0 0 is the value of 0 at the reference stationary point and the coefficients
AO~ Ai,
and
Bij are
explicit functions of the second, third, and fourth
derivatives both of the energy and of the observable with respect to normal modes Q [89]. When the property is sufficiently well represented by a bilinear form of the normal modes
O= 0 0 +ZaiQi + Z [3ijQiQj i ij
(27)
its expectation value in the vibrational state can be written O=O 0 +
Zoti(ai) n + Z[Jii(aiaj)n i
(28)
i
with
(Qi)n - 4~i ~.. Fijm. (nj +-12) J J 1
i (ni
(29a)
1
4Aii and Fijk is the third derivative of the energy w.r.t, normal Qi,Qj,Qk. The average values at OK (rz structure) are obtained when all
where mi =
modes the ni = 0, whereas the values at T K (ra structure when T=298.15 K) can be obtained in a first approximation (using the harmonic oscillator partition function) by the replacement
ni ---->coth(
f~
KBT )
(30)
490
where Ka is the Boltzmann constant. Since the cartesian displacement coordinates of the nuclei (in the molecule fixed Eckart axis system) are related to the normal coordinates by a strictly linear transformation, the average values
specify the displacements of the average nuclear positions from the equilibrium positions. At the ra molecular geometry the value of the observable O is = o()= o ~ +
Zai(Qi>+ 89/3ij (Qi) i
(31)
i,j
and thus the average value can be written =
I
flii (- )
(32a)
i Hence, to first order in the Qi and retaining only the principal anharmonic and harmonic contributions 1
(O> = Oa
+~2flii
(32b)
i The above equations show that the calculation of physico-chemical observables can be performed using the usual frozen nuclei approximation whenever they are slowly varying functions of nuclear coordinates (i.e. flii = O) and the dynamics of the system is essentially harmonic (i.e. ( Q / ) = 0). For semirigid molecules the whole computation can be reduced to the evaluation of a subset of third derivatives of the energy and of diagonal second derivatives of the properties. Unfortunately this approach is ill-adapted to treat large amplitude vibrations because of their strongly curvilinear character and of the poor convergence in the Taylor expansion of the potential. However, as mentioned above, several interesting situations of this kind are dominated by the dynamics occurring in a small amplitude many-dimensional harmonic valley surrounding a large amplitude path (LAP). Of course the same approach can be extended to large amplitude surfaces, or hypersurfaces, but we shall consider in the following only the basic effective one-dimensional model. Then motion along the path can be described in terms of the arc length along it in mass weighted (MW) cartesian coordinates [large amplitude coordinate, LAC, (s)] and of its conjugate momentum (P~). The only necessary condition for the path is that it
491
must not contain any translational or rotational component. Next small amplitude (SA) vibrations are described by 3N-7 (3N-6 for linear systems) local normal coordinates (Q1) and their conjugate momenta (P1). These local vibrational coordinates must be orthogonal to the path tangent, to translations, and to infinitesimal rotations. In the adiabatic approach [90], the components of the SA coordinates in the space of MW cartesian coordinates are the eigenvectors of the Hessian matrix (which is, of course, a function of s) from which translations, rotations, and path tangent are projected out. The potential governing the motion along the LAP will be a general function V~ whereas the potential energy contributions for motions orthogonal to the LAP are approximated to the second order V( s, Q ) - V 0 ( s ) +
E i
aZ
Vi 1( s )Qi + -~
vo2 ( s ) Q i Q j
(33)
ij
where the superscripts denote the order of derivation with respect to SA motions. The so-called intrinsic reaction path (IRP) is always parallel to Vl(s), so that the first order contribution vanishes. For intramolecular dynamics, however, the simpler distinguished coordinate (DC) approach [92] has the advantage of being isotope independent and also well defined beyond energy minima, while still retaining an almost negligible coupling between the gradient and the path tangent. This model corresponds to the construction of the onedimensional path through the optimization of the other geometrical parameters at selected values of a specific internal coordinate q~. Note, however, that q~ is not the coordinate used in the following dynamic treatment since the distance along the path and the path tangent depend also on the other coordinates, whose values change With q~. When the IRP is traced, successive points are obtained following the energy gradient and do not contain, therefore, translation or rotation components. Successive points coming from separate geometry optimizations (as in the case of the DC model) introduce the additional problem of their relative orientation. In fact, the distance in MW coordinates between adjacent points is altered by the rotation or translation of their respective reference axes. The problem of translation has the trivial solution of centering the reference axes at the center of mass of the system. On the other hand, for non planar systems, the problem of rotations does not have a close solution and has been solved by means of the minimization of the distance between successive points as a function of the Euler angles or of the quaternions of the system [9396]. The classical kinetic energy for vanishing total angular momentum contains the couplings between yhe LAC and the local normal coordinates and between
492
differem local coordinates. When these coupling terms are negligible, the adiabatic Hamiltonian governing the motion along the LAP assumes the simple form 1
H O( s,n ) = -~ Ps2
+
Vad ( S,n )
(34)
where
Vad(S,n)=vO(s)-VO(sO)+
n i +~ coi(s)-coi(sO)
(35)
and so refers to a suitable reference structure lying along the LAP. Note that quantizafion of SA motions has been introduced in the above equations in terms of their harmonic frequencies (o~) and corresponding quantum numbers (ni). The adiabatic potential obtained when all the corresponding quantum numbers are 0 is usually referred to as the ground state vibrafionally adiabatic potential [Vgs(S)] and is well approximated by V ~ whenever the frequencies of SA vibrations do not change very much along the LAP. For not vanishing angular momenta we must add a rotational contribution
ej(rot) ej(rot) = A( s )( j 2 _ K 2 )cos 2 qK + B( s )( j 2 _ K 2 )sin2qK + C( s )K 2
(36)
where (K, q~ ) are the projection of the total angular momenttma along a bodyfixed axis and its conjugate angle variable and A(s ), B(s ), C(s ) are the three rotational constants as functions of the distance s along the LAP. For applications it is useful to introduce a zeroth order rotation Hamiltonian which is that of a symmetric top; i.e.
1 [A(s )+ B ( s ) ~ j 2 - K2)+ C ( s ) K 2 + Aej(rot) e j( rot ) = -~
(37)
where 1
A1zj( rot )= -~ [A( s ) - B( s )~j2 _ K 2 )cos(2qK )
(38)
493
Neglecting the asymmetric rotor coupling term
Aej(rot), the
adiabatic
representation of the zeroth order Hamiltonian is that of a one-dimensional system (because J, K, and n are conserved):
1
HO(s,J,K,n)=-~P 2 + VJKn( S)
(39)
VjKn(S) = gad (s,n) + ~1 [A(s) + B(s) ](j2 -K2)+C(s)K 2
(40)
Once the variations of appropriately fitted (e.g. by by this Hamiltonian can be expectation value (O)T of a
the different terms as functions of s have been spline functions), the ro-vibrational states supported computed by suitable numerical procedures and the given observable in the eigenstate I j) corresponding
to the eigenvalue ej is given by
(o)j = O~
j(,))
(41)
where O ~ is the value of the observable at a suitable reference structure and AO(s) is the expression (e.g. again a spline fitting) giving its variation as a function of the progress variable s. The temperature dependence of the observable is obtained by assuming a Boltzmann population of the ro-vibrational levels, so that
Z < J] AO(s)j > exp[(e0 -ej)/KT] (O)T = 0~
J
Zexp[(eO-Ej)/KT]
(42)
The above equations point out the possibility of computing the reference value of the observable at a very sophisticate level (possibly using a geometry optimized at a lower level) and vibrational modulation effects (arising from the geometry dependence both of the observable and of the electronic energy) at a lower level. Furthermore, since the LAP is tangent to one normal mode (say Qf) at stationary points a first order perturbative treatment gives
494
1
n f + -~ ~ 20 < A O > = 4(.o---~'OQ~
Ffff 2
cof
1 ni +-~ 020 ~0 OQf "k"i~f 40)i ~a2
(43)
which is very useful for interpretative purposes. Temperature dependent results can be obtained using equation (30). In the last part of this paper we will see that vibrational averaging plays a significant role in the computation of reliable physico-chemical properties for a number of interesting systems. Here we discuss just the simple case of isotropic hyperfine coupling constants of fluoromethyl radicals CH~F3_~. These radicals are characterized by a low frequency inversion motion and by an equilibrium structure whose piramidality increases with the number of fluorine atoms [96]. They can be treated in a consistent way choosing the distinguished coordinate as the average out of plane angle ~:
1 (191 + 0 2 +0 3 )_ 9 0 0
(44)
where 0i is the angle of the ith substituent at the carbon with respect to an arbitrary axis through the carbon atom. Since it is always possible to find an axis through a central atom that makes an identical angle with each of three unique subsfituem atoms, ~ represents a single degree of freedom, i.e., the valence angles are otherwise unconstrained. Note that for AX3 systems, I: is related to the valence angle XAX by ~" = ~ 1 COS-1 [2sin(XAX
/ 2)]
(45)
The behavior of coupling constants as a function of x is qualitatively very similar for the different radicals: a(C) is always positive and increases with I: due to the progressive contribution of carbon s orbitals to the singly occupied molecular orbital (SOMO). The effect is similar for a(F) and a(H), but since a(H) is negative for the planar conformation (due to first order spin polarization) the absolute value of a(H) decreases up to x-10 ~ and next increases. This allows to discuss vibrational averaging effects simply in terms of the potential governing the out of plane motion. Table 7 contains the most significant results obtained for these radicals by coupled cluster computations (CCSD(T)) employing the Chipman basis set [97].
495
In a static approach, we should distinguish between planar (CH3) and non-planar (all the other radicals) systems. The dynamic approach shifts attention to the position (above or below the barrier to planarity, if any) of the ground vibrational level. In the present context, this means that both CH3 and CH2F belong to the class of quasi-planar systems, whereas CHF2 and CF3 are true pyramidal molecules. The ground vibrational wave function of quasi-planar systems is peaked at the planar structure which, irrespective of being a minimum or a saddle point, is the natural reference configuration for vibrational averaging. Table 7. Out of plane angle (Xmin),inversion barrier (AdSin kJ mol'l), harmonic (to) and fundamental (vt,0 inversion frequencies (in cm-1) and hyperfine coupling constants (in G) computed at the nergy minimum (stat.), upon vibrational averaging (dyn.) and from EPR spectra (exp.). Available experimental values of angles, barriers and frequencies are given in parenthesis. CH3 CH2F CHF2 CF3 "l;rnin
AE (.l)inv Vi~v stat. a C) dyn. exp.
a I-I)
stat. dyn. exp.
a(F)
stat. dyn. exp.
0.0 488 613(603) 27.8 37.7 38.3
10.0 2.7 705 250 (260) 68.0 54.2 54.8
-24.6 -22.6 -23.1
- 16.5 -21.5 -21.1
20.6 18.2 22.2
66.5 64.7 64.3
86.5 84.2 84.7
0.0
14.8(16.3____.2) 17.5 (18.2) 35.8 (35.5+_5) 128.5 1076 717 932 (949) 786 (701) 152.1 267.1 147.4 266.7 148.8 271.6
151.1 150.5 142.4
The wave function of the ground vibrational state, being symmetrically spread around x=0 ~ introduces contributions of pyramidal configurations. This results in a significant increase of the absolute values of the coupling constants, which are led in remarkable agreement with experiment. On the other hand radicals characterized by a deep double well potential (CHF: and CF3) can be effectively treated as systems governed by a single well potential unsymmetrically rising on the two sides of the minimum energy configuration. Harmonic contributions vanish since the variation of coupling
496
constants near the equilibrium conformation is linear. When anharmonic terms are taken into account vibrational averaging brings the coupling constants to values which would be obtained for z < Xmm since the ground state vibrational wave function is more localized inside the potential well, even under the barrier, than outside. However, high energy barriers imply high vibrational frequencies and, as a consequence, smaller average displacements around the equilibrium positions. Since Ffff
and
~ 0 have the same sign near the equilibrium
~Qf structure, the linear term in equation (43) is negative, thus counterbalancing the positive quadratic term. This explains the good agreement between experiment and static computations. 4. SOLVENT EFFECTS The importance of solvem effects on chemical properties and reactivity could hardly be overestimated: indeed nearly all the biochemical processes occur in condensed phases, and the interactions between the system of interest and the surrounding medium must be taken into account to achieve a realistic physicochemical description. Several models are used to depict solute-solvent interactions: they can be grouped into three broad classes: 9 molecular mechanics (MM) or molecular dynamics (MD) simulations (with hundreds or thousands of molecules interacting through classical force fields); 9 cluster calculations (i. e. ab initio studies on small aggregates of solvent molecules surrounding the system of interest); 9 continuum models (with a sharp cut between the "solute", i. e. the chemical system one is interested in, and the solvent represented with a structureless polarizable medium characterised by macroscopic quantifies, mainly the dielectric constant and the density). All the above approaches have merits and deficiencies in particular applications, but the last one is often preferable for at least two reasons: the attention can be focused on the solute which can be studied at the highest levels available for isolated molecules, introducing the solvent effects as perturbations treated at the same level of accuracy, and the great number of possible arrangements of the solvent molecules around the solute are naturally "averaged" by using macroscopic solvent characteristics. As reported in the following, very accurate and effective algorithms have been developed to describe solute-solvent interactions in the framework of continuum models [98,99,100]. Clearly, the physical model of structureless polarizable solvent itself is sometimes inadequate: in most of these cases,
497
however, it has been shown that it is sufficient to extend the description by introducing few explicit solvent molecules, strongly bound to the solute, so that one obtains a new partition of the system with the solute and (a part of) the first solvation shell treated at high quantum level, and the bulk of the solvent described by the continuum. Of course this mixed discrete-continuum description must be adopted when one or more solvent molecules are directly involved in some chemical reactions. Apart these cases, we found that equilibrium energies and reaction barriers in solution are usually well reproduced by "pure" continuum descriptions, while a mixed discretecontinuum approach is useful to investigate the solvent effect on spectra and other electronic properties in protic solvents like water. Our computational strategy for the study of biochemical systems in solution applies a solvent model belonging to the family of polarizable continuum models (PCM), introduced in 1981 [101] and continuously updated and extended [102-113]. Recently, very refined and effective PCM algorithms have been implemented in widely used ab inifio codes, so that molecules in solution can now be studied at any level of the theory (from molecular mechanics to Hartree-Fock, MP2 and configuration interaction) with a very limited computational burden with respect to the corresponding calculations in vacuo. Presently the most complete PCM implementation is available in the Gaussian series of programs [68] (Gaussian98 and the development version, Gaussian99, which will be distributed in the next future), allowing for: 9 energy calculations at the MM, HF, DFT, MP2, CISD, CCSD and complete active space (CAS-SCF) levels; 9 analytic gradient calculations (then geometry optimisations) at the MM, HF, DPT, MP2 and CAS-SCF levels; 9 analytic force constant calculations at the MM, HF, DP'T, MP2 levels. The analysis is not limited to the ground state: excitation energies in solution can be computed at the CAS-SCF level or with time-dependent HF of DF theory, and the solute geometry can be optimised in any excited state at the CAS-SCF level. Work is in progress to extend also to time-dependent HF and DF the possibility of geometry optimisafions in solution. An important feature when biochemical systems are involved is the performance of the method with very large solutes: the PCM computational bottlenecks have been analysed for this problem, and specific algorithms have been elaborated in order to extend this treatment also to solutes with hundreds or thousands of atoms, as briefly resumed in the following. The need of mixed methods (in which a small part of a biomolecule is treated at high ab initio level, while the remaining part is described less accurately) is now commonly acknowledged for many applications, and several models have been proposed. Of course also the solvent description must be adapted to such
498
methods: presently, PCM can be used with the ONIOM method [111] implemented in Gaussian, though some work is still to be done to extend the polarizable continuum solvation model also to other mixed procedures. Thanks to the improvements of computational codes and of hardware performances, this field is now becoming accessible to many researchers, and we expect that the great number of interesting applications will stimulate also the production of new models and algorithms. 4.1 Outline of the PCM. The details of the various PCM procedures have been exhaustively reported in many papers [101,102,107], to which the reader is addressed for precise information: here we shall simply sketch the main characteristics of the method. The partition between solute and solvent is performed by defining a "cavity" in the polarizable medium: the solute is placed inside the cavity, where the relative dielectric constant is 1 (i. e. the value of vacuum), while outside the dielectric constant has the macroscopic value of the solvent (for example 78.39 for water at 298 K, 10.36 for 1,2-dichloroethane and so on). The shape and the size of the cavity are important parameters of the method.
Figure 3. GePol cavity for [3-alanine zwitterion subdivided into tesserae with average area of 0.4/~2 and of 0.2/it2 respectively. Unlike many other continuum approaches, PCM adopts cavities of realistic shape modelled on the solute atoms: they are built according to GePol algorithm, [114] in which the cavity is defined as the envelope of spheres centred on solute atoms or atomic groups. Besides the atomic spheres, other spheres are added by GePol to smooth the solute-solvent boundary, approximating the so-called solvent accessible surface proposed by Connolly.
499
The cavity surface is then subdivided into small domains, called surface tesserae, used to express as finite sums all the surface integrals needed to compute the solvent reaction field, as explained below. The final result is depicted in figure 3, where we show the GePol cavity for ~-alanine subdivided into tesserae with average area of 0.4/~2 and of 0.2 ~2 respectively. The solute-solvent interactions are accounted for by a perturbation added to the solute Hamiltonian operator: = ~ 0 + r~a
(46)
where ~0 is the Hamiltonian for the isolated solute, and r~a expresses the solvent reaction field due to the dielectric polarisation induced by the presence of the solute. It can be shown that the whole reaction field can be described in terms of an apparent charge density ~ appearing on the cavity surface, possibly corrected to account for the small fraction of solute electron cloud escaping from the cavity. For computational convenience, one assumes ~ to be constant in each tessera, so that r~cr can be expressed as a sum of finite point charges ("solvation charges", {qi }) placed in the middle of the surface tesserae:
tesserae ~,~ qi Va(~)= ]r_ ri I i ^
(47)
In the original PCM version (dielectric PCM, DPCM) the solvation charges depend on the normal component of the electric field on the cavity surface; in other versions (CPCM and IEFPCM) on the electrostatic potential on the cavity surface: in any case the solvation charges are determined by 9 the solute nuclear and electronic density (then r?cris a nonlinear perturbation, depending on the final wave function); 9 the solvation charges themselves, so that a set of N coupled linear equations must be solved to find the charges, where N is the number of tesserae. The actual reaction field can be computed with different PCM algorithms, which can be chosen according to the particular problem investigated: in general, at each SCF cycle the solvation charges are determined by solving a linear system of the form Dq = - b
sol
(48)
500
where vector q contains the charges, vector b~ot contains the solute electrostatic potential (or the normal component of the solute electric field), and matrix D depends on the solvent dielectric constant and on geometric parameters related to the shape of the cavity. The system of equations (48) can be solved either by inverting matrix D, or with iterative procedures, possibly accelerated by specific algorithms, like the conjugate gradient method. Once the charges have been obtained, the perturbation operator is computed according to equation (47) and the perturbed Hamiltonian (equation (46)) is used exactly as h0in usual calculations. This approach offers the great advantage that all the techniques elaborated for the study of isolated molecules can be extended to systems in solution, provided one is able to correct the corresponding expressions for the contribution of I2a . In particular the Fock operator in the presence of solvation charges becomes
DU 1 (vjUC+v;l)
~0 +21r-ril.. tj ^
F=/~O +~}~ =/~0 + .
qi
Ir-ril
l
OF
(49)
+21r-r D..
D~ 1 (ETUC+EjI)
q
solute nuclei where V~uc = 2 IrnZn _ rj I and Vjel = - (W I Ir - rj 1-1 IV) are the nuclear and n
the electronic contributions to the solute electrostatic potential in tessera j, and
EjnUC,Ejel the corresponding electric field normal to the surface. Expression (49) allows one to express the perturbed Fock operator on an atomic basis
F =F - ]r_ril(z#llr-rjl ,,
D~ 1
-1
Iz )
tj
or the corresponding expression with the electric field operator.
(50)
501
Since the reaction field operator depends on the solute wave function (i. e. it is a nonlinear perturbation to /~0) it can be shown that the variational minimisation leads to the following quantity 1
1
G = ( W [ /_)0 +-l?cr2 [qJ)+-2
Z qiVinuc= (~P[ ~ 0 + _2vZ. t i
t
+ 2-1- Z qiVinuc=(~tj[ 1710[~ ) +-21 Z i
qi
lmr_
IV\ +
[I
/
(51)
~inu c viel) qi " + i
having the status of a free energy in solution (the factor 1/2 appears because one half of the solute-solvent interaction energy has been spent to create the solvation charges). Equations (49), (50) and (51) can be differentiated with respect to external perturbations (e. g. electric or magnetic fields) and with respect to nuclear coordinates, allowing for the analytical computation of free energy gradients [103,104] and second derivatives [106,110]: they are used for geometry optimisations in solution, and for the calculation of force constants, polarizabilities etc. In particular, a very effective algorithm for the computation of the derivative of G,I with respect to geometrical parameters has been introduced recently which can be applied to all the variants of PCM. It leads to addition of the following term to the gradient of the isolated molecule (computed, of course, with the electron density converged in solution) [ 110]:
I2tesses
I x tesserae 21rEzq 2 qiVi = ZqiVi x + ~ ~Ui(x ) e. -1 i~F ai i i
(52)
where Vi x is the partial derivative of V i with respect to the parameter x, and the last sum runs over the part of the cavity moving as a consequence of the nuclear displacement. The term Ui(x) is related to tesserae geometrical parameters and can be computed exactly at least for GePol cavities; however, a number of studies has shown that the so called rigid cavity approximation, in which the last term of equation (52) is neglected, is often a good approximation for polar solutes in polar solvents. Under the same conditions, also the derivatives of non electrostatic contributions can be neglected. In the fixed cavity approximation derivation of equation (51) with respect to a second nuclear displacement, y, gives
502
I,esexae lY ,e,ex. 1
-~
(qiVi
i
)x
=
x,
qiVi Y
tesserae ED~Ivixv
y
(53)
i
Terms related to the derivative of the density matrix do also appear, which lead to modified coupled perturbed equations, as described in detail in ref.[ 106]. 4.2 Extension to large solutes. Accurate calculations on very large systems, like biomolecules, raise computational problems even for isolated systems: it is desirable that the inclusion of solvem effects does not create other, specific problems. From this point of view, the main PCM bottlenecks are 9 the construction of the cavity; 9 the solution of the linear system 48. The first point becomes important when thousands of atomic spheres have to be defined and cross-checked to eliminate the parts of the surface lying inside the cavity: in the case of globular proteins, for example, more than 90% of the generated spheres are eventually discarded since they are not in contact with the solvent but they still contribute to the time spent to build the cavity. If all or a large part of the solute is treated at the MM level, the GePol procedure is the most time consuming step of the calculation, growing as N2t (where NariS the number of solute atoms). To avoid this bottleneck, an alternative procedure has been elaborated to build very large cavities: in brief, instead of defining a sphere around each solute atom, the whole molecule is inserted in a large sphere, which is "deflated" until it adheres to the external surface of the solute. This procedure, called DefPol [115], becomes competitive with GePol above 1000-2000 atomic spheres, as described in ref. 76. Though for some aspects DefPol is still inferior to GePol (for example, one cannot compute analytical gradients of the position of surface tesserae), this approach is perfectly suited for most applications on biomolecules, and its computational times are satisfactorily limited. In figure 4 we present the GePol and DefPol cavities for a model polypepfide. As for the solution of the linear system, the standard approach based on the inversion of D matrix (see equation (48)) becomes unmanageable for very large solutes due to both the computational time and the disk memory occupation it requires. To deal with these cases an iterafive procedure has been developed, [112] which is able to solve equation (48) without defining and invei'ting the full D matrix. A specific two-step extrapolation technique proved very effective in the solution of this problem, especially for the PCM variant based on the normal
503
component of the electric field. On the other hand the PCM versions based on the electrostatic potential exhibit a more difficult convergence behaviour (as could be expected, because in the former case the mutual interaction between two solvation charges i, j decreases as Ir01-3and in the latter as IruI~) but they can also be treated by this procedure. In all the cases the iterative approach is more convenient than the matrix inversion for cavities with some thousands of surface tesserae.
Figure 4. GePol and DefPol cavities for the alanine tripeptide in cz-helixconformation. An even better performance can be obtained by applying to this problem the so-called fast multipole method (FMM), which has been developed explicitly to treat large linear systems. The FMM approach can be applied to equation (48) by dividing the surface in different regions and computing the interaction of each couple of solvation charges exactly if they belong to the same region, or approximately (with a more and more trtmcated multipole expansion) if they belong to different regions. So far this approach has been tested on long polymeric chains with several units, proving extremely convenient. In conclusion, thanks to the combined application of DefPol cavities and iterative (possibly FMM) approaches to solve the electrostatic problem, the PCM description of solute-solvent interactions can be extended to any solute which can be studied in vacuo without loss of accuracy with respect to the
504
traditional PCM treatment and with very limited computational times and memory occupation. Of course, to profit by these effective PCM formulations completely, they have to be linked to some computational procedure able to describe very large solutes: recently several mixed methods have been proposed to deal with such systems. In mixed methods, a small part of the studied (macro)molecule is treated at high level, often by some quantum mechanical (QM) approach, while the remaining part of the system is studied less accurately, for instance with molecular mechanics (MM) techniques: in this case such a procedure is also called QM/MM. The advantage of QM/MM procedures is that the attention can be focused on a specific part of the system, which is more interesting from the chemical point of view, taking into account all the interactions within the whole molecule, at least partially. Presently PCM can be combined with a QM/MM procedure called ONIOM [108], as implemented in the development version of Gaussian program: in ONIOM large molecules are divided in two or three layers, which are studied with decreasing levels of accuracy, and the results for each layer are then combined to get an estimate of the energy and properties of the full system. To allow PCM/ONIOM studies of large molecules in solution both procedures have been slightly modified: in particular the cavity must be carefully defined for the different solute layers; furthermore, when a small layer is studied at high level, one has to decide which dielectric constant must be associated to the outer layers 16
-200
/
-240
-280 -8 r
-320 ~" -16 -360
-24 -32
-400
-40
-440 -40
-32
-24
-16
-8
Experimental (a)
0
8
16
-440
-400
-360
-320
-280
-240
-200
Experimental (b)
Figure 5 . Calculated and experimental solvation free energies (kJ/mol) for a number of neutral (a) and ionic (b) molecules. (they are neither vacuum nor solvent). Anyway if the proper attention is paid to these modelisation aspects, the procedures described above allow for the
505
accurate and effective study of solute-solvent interactions even for very large solutes. 4.3 Some examples. The simplest, but yet very useful, application of the continuum solvent model described above is the calculation of solvation free energies, i. e. the free energy differences associated to the transfer of a molecule from the gas to the liquid phase (AGso l): a proper parameterisation of cavity size and shape allows one to compute very accurate AGsolfor a number of neutral and charged solutes, as illustrated in figure 5 [ 116]. Experimental vs. Calculated pKa of carboxylic acids 17" 9 .7
.J D
I"
4
P
f
C
i"
,q,"
M j,1
j"
.
,f j.7.j1-"" j/,1~ "" ,I" o
o
1
i
i
i
J
1
2
3
4
5
6
Experimental Figure 6. Calculated and experimental pKa of carboxylic acids, as reported in ref. 111.
Solvation free energies can be used to evaluate conformational preferences, energy minima and reaction profiles for any chemical system in solution: of course the quality of the results depends also on the level of the theoretical approach (i. e. on the calculation in vacuo), but in many cases one can say that the inclusion of solvent effects does not lower the performances of the overall description. In this framework, it is very useful to evaluate the solvent effects on the solute geometry (this is sometimes called "indirect" solvent effect), and also on the vibrational frequencies (adding zero point energy corrections to the calculated free energies)" as briefly sketched above, PCM is able to compute both geometry and vibrational corrections effectively. Recently PCM has been used for the ab initio prediction of the pKa of a number of carboxylic acids[111]:
506
Figure 7. (a) Ramachandran map of tyrosine peptide analogue calculated by the AMBER force field in vacuo (a) and in aqueous solution (b).
507
though this is a delicate task even for isolated systems, critically depending on the level of the calculation in vacuo, the results are satisfactory as shown in figure 6. Another application of great interest for biomolecule studies is the computation of conformational maps in solution: despite the deep influence that solute-solvent interactions can have on biomolecule conformations, only few examples of such studies are nowadays available. Nonetheless one can expect in the next future a huge increase of conformational studies which take into account the solvent effects. In figure 7 we report the Ramachandran map for the tyrosine model peptide computed in vacuo and in aqueous solution by the AMBER force field, which clearly shows the importance of solvent effects (a new minimum appears in a region not accessible in vacuo). At a more refined level, PCM can be used to evaluate the environmental effects on the solute wave function, i. e. on electronic properties: the first straightforward application is the prediction of solvent effects on IR, Ram_an and magnetic spectra. Infrared and Raman spectra can be studied with the same procedure adopted for the evaluation of zero point energies, providing vibrational frequencies in solution. As an example, in figure 8 and 9 we report IR and Raman spectra for thymine computed in vacuo and in aqueous solution at the PBE0/6-311G(d,p) level. UV spectra usually involve electronic state transitions, so that simple HartreeFock and DFT calculations often are not sufficient: PCM has been recently extended also to multi-configurational (MC-SCF) calculations [113] and to time-dependent approaches, allowing for the description of excited states and then the prediction of the so-called solvatochromic effects on these spectra. Nuclear magnetic resonance (NMR) and electron spin resonance (EPR) spectra are even more influenced by solute-solvent interactions: moreover, the interpretation of experimental data is often very difficult without the support of reliable ab initio calculation, especially for EPR which is usually applied to unstable radical species. Due to the great importance of such applications, the most advanced PCM calculations on some biomolecules will be analyzed in deeper detail in the next chapter. We end this section underlining once again that, thanks to the effective and reliable continuum solvent models now available, it is possible to extend to the liquid phase almost all the computational techniques developed in past years for isolated systems. 5. A P P L I C A T I O N S This part of the chapter is devoted to the structural analysis of some flexible molecules in condensed phase, for which a quantum mechanical approach becomes mandatory even for a qualitative study (e.g. zwitterions, radicals).
508
:;
I i
I
I . . . . 4000
.......
,
. . . .
I
3500
3000
PB E0/6-311G(d,p) C P C M ( w a t e r)/P B E 0 / 6 - 3 1 1 G ( d , p )
,
.
.
.
.
2000
,
.
.
.
150~avenumber
.
,
.
.
.
.
/ cm:h000
500
Figure 8. IR spectrum of thymine.
I b
!.~il J~L 35oo
3ooc
PBE0/6-311G(d,p) CPCM (water)/PB E 0/6-311G(d,p)
1511O
1000 W a ~ l u m b e r / cm "1
Figure 9. Raman spectrum of thymine
500
509
5.1 Conformational analysis including solvent effects The conformational characteristics of biomolecules are of paramount importance for understanding both their reactivity and their affinity for specific receptors. The natural environment of these systems is an aqueous solution and solute-solvent interactions can often modify in a considerable amount the behaviour of isolated molecules. This is particularly important for~ flexible molecules whose bioactive conformation often does not correspond to the most stable unbound structure. Thus the energy needed to reach the active conformation is an important factor in structure-activity studies and its accurate evaluation is mandatory for the useftdness of any model [118,119]. The presence of charged groups fiarther complicates the delicate equih'brium between intramolecular and solute-solvent interactions, especially in aqueous solutions. The case presented here is the ~-alanine zwitterion (BAZ), which is the smallest zwitterionic molecule able to form an intra molecular hydrogen bond (see figure 10). The conformational behaviour of this system has been recently established by NMR spectroscopy [120,121]. NMR data show that gauche and anti arrangement of the aminic and the carboxylate moieties characterize the only two conformers obtained in aqueous solution: they have nearly the same stability with, at most, a slight preference for the anti conformer.
Figure 10. Structure of anti and gauche conformers of BAZ obtained at the HF/63 I+G(d)/CPCM level.
The fully relaxed structures of both the zwitterion conformers obtained in aqueous solution at the HF/6-31+G(d)/CPCM level are shown in figure 10 [122]. In vacuo the gauche conformer is not an energy minimum: a fully geometry optimization would lead to the transfer of one ammonium proton to the carboxylate group. On the other hand, this conformer becomes a true minimum
510
when solvent effects are considered: gas-phase energies can thus be obtained either with constrained optimizations (avoiding the proton transfer) or using the geometries optimized in solution; in the following we use the latter approach. Energy calculations performed in vacuo at the HF/6-31+G(d) level using geometries optimized in aqueous solution show that the gauche conformer is favored by 86.2 kJ/mol. This value resembles the energy differences reported in literature for the breaking of an internal hydrogen bond in zwitterionic systems [120,123,124], and is essentially due to the ability of the gauche conformer to form an intramolecular H-bond between carboxylate and ammonium groups. Thus only the gauche conformer would be present in the gas phase contrary to the interpretation of NMR data in aqueous solution [120,121]. The solvation free energies AG calculated for gauche and anti conformers at the HF/6-31+G(d) and B3LYP/6-31+G(d) levels at the corresponding optimized geometries are shown in Table 8 together with their partitioning into contributions arising from different groups. As usual for polar solvents the main contribution to the solute-solvent interaction is the electrostatic term, which stabilizes the anti conformer by more than 79 kJ/mol compared to the gauche conformer at the HF/6-31 +G(d) level. To proceed further we recall that AG can be decomposed into an electrostatic contribution (AG ~1) corresponding to solute-solvent interactions with the wave function already polarized by the solvent and into the polarization work (AG pol) needed to polarize the solute wave function from its optimum value in vacuo. Of course only the first term can be dissected into contributions originating from different spheres of the cavity. Table 8 shows this dissection for both the conformers obtained at HF/6-31+G(d) level: the largest contributions to the solvation energies are due to the ionized moieties of the zwitterion, which are more exposed to the solvent in the anti conformer than in its gauche counterpart. A more sophisticate analysis was performed considering the torsional motion around the N1C2C3C 4 dihedral angle (0) as the single large anaplitude mode of the molecule. Figures 1 l a and 1 l b show the potential energy curves obtained in vacuo and in aqueous solution together with low energy vibrational wave functions. When solute-solvent interactions are not considered, the gauche conformer is markedly more stable than the anti conformer; as a consequence, the lower energy vibrational levels involve the gauche conformer only. In solution the anti conformer becomes the global minimmn, separated by a torsional barrier of 13.6 kJ/mol from the two equivalent gauche conformers, with low energy vibrational states localized around each minimum.
511
120 -
100
80 o E
60
"-3 LU .<1
40
20
0 0
60
120
180
240
300
360
Torsion angle 4) (Degree)
30 25 2O
\i i\,~
5
\i/\v
.xJ! 7
0
i
0
I
60
I
I
120
i
I
180
n
I
240
i
I
300
,
360
Torsion angle 4) (Degree)
Figure 11. Energy profiles and vibrational levels corresponding to the torsional motion around q) in BAZ calculated in vacuo (a) and in aqueous solution (b).
512
Table 8. Partition of the solute-solvent electrostatic interaction (AG el (kJ/mol)) and of the cavity surface into spheres centered on different solute atoms. Note that hydrogen atoms are contained in the same sphere of the atom to which they are bonded. Gauche Anti Surface AG el Surface AG el N1 32.14 -86.36 36.23 -136.06 C2 21.83 -20.84 20.75 -17.82 C3 22.29 -8.37 21.50 -11.30 Ca 12.89 -17.24 13.33 -21.71 05 12.01 -47.24 15.34 -76.27 06 16.59 -87.78 16.58 -94.39 AG elt~ -224.89 -304.39 AG pol 42.93 53.18 Non El. - 1.84 - 1.59 AG -226.73 -305.98 Comparing both curves we can notice that the gauche energy minimum corresponds to values of the torsional angle of about 50 ~ and 61 o respectively in vacuo and in solution: a smaller dihedral angle leads to a lower distance between the ammonium and the caboxylate groups, indicating that the intramolecular hydrogen bond is stronger in vacuo than in aqueous solution. 300 MHz proton N M R spectra recorded at room temperature show that BAZ behaves as an A2B2 system, with a vicinal 3J(HAHB) coupling constant of 6.70 Hz [120]. This can be considered the averaged value of the coupling constants 3J(H2H3 ) and 3j(H2H3, ) (see figure 12)"
o9 H 2 ~
o. o H2' ~
H3"" [ "H3' NH3+
~N
I n t H
H3,"
2
[ "H3' 1-12'
~
-
H 2' L ~ H3,"
A G Figure 12. Newmann projections of BAZ stable conformers.
NH3+ I "1-I3' 1-I2 G'
since the experimental signal shows a line width of 0.7 Hz, the primitive coupling constants cannot differ by more than 0.8-0.9 Hz. [120,121]. When only gauche and anti conformers are significantly populated, their percentages can be estimated using the extended Karplus equation proposed by Altona and coworkers [125] and assuming that gauche and anti minima correspond exactly to ( p = 60 ~ and r = 180 ~ respectively: the experimental
513
averaged value of 3J(HAHB) leads to nearly equal amounts of A, G + and Gconformers (35, 32.5, and 32.5%, respectively), i.e. to a free energy difference of about 0.4 kJ/mol between gauche and anti conformers. The values of 3J(HAHB) along the large amplitude torsional path were calculated by using the equation proposed by Altona and then averaged according to the DiNa procedure described in section 3. The coupling constants of gauche and anti conformers are sufficiently different (8.88 and 5.92 Hz, respectively) to allow an unbiased evaluation of the populations of both structures. Already the value averaged on the ground vibrational state (as would be obtained at 0 K) is not far from the experimental value, but thermal effects lead the computed value in full agreement with its experimental counterpart (6.67 vs 6.76 Hz). This finding points out at the same time the reliability of the extended Karplus equation of ref. [125] and the usefulness of our integrated approach including at the same time solvent and vibrational averaging effects. Thus, the present results show that PCM is a very powerful tool for describing the behavior in solution of compounds able to form intramolecular hydrogen bonds. Direct computation of vicinal NMR coupling constants including vibrational averaging effects confirms this conclusion further and paves the route for more reliable quantum mechanical studies of bioactive systems.
5.2 Characterization of organic free radicals. Structure and magnetic properties. Organic free radicals are key intermediates in a number of reactions of biological significance. For istance, there is strong evidence that the biosynthesis of several natural substances and many enzymatic reactions involve amino acid radicals[126-129], and radiation damage to DNA is known to proceed through a number of base-centered radicals[130-132]. Furthermore organic free radicals can be exploited as spin probes in the study of macromolecular systems by means of EPR spectroscopy [133]. While a fidl characterization of these intermediates would provide significant mechanistic information, this task is not simple from an experimental point of view since usually only spectroscopic techniques can be employed and the relationships between spectroscopic parameters and structural features is quite indirect. Furthermore, the measured quantifies often result from the superposition of different contributions which are very difficult, when not impossible, to separate. In such circumstances, quantum mechanical computations can provide an invaluable support to experiment since they are able to selectively switch a number of interactions on and off, thus allowing an unbiased evaluation of the effect of different contributions. The following description of the theoretical characterization of organic radicals can be considered as a template both for the methodological approach to
514
be adopted for this kind of systems and for the general importance of the chosen examples, belonging to the most interesting classes of free organic radicals" txamino acids-centered, pyrimidine and purine bases-centered and NO-centered radicals.
5.2.1 Glycine radical. Among the radicals derived from H atom extraction from glycine, the Ccentered ones are significantly more stable [134,135]: their general structures and atom labelings are shown in Figure 13.
Figure 13. Schematic drawing of glycine radical. In Table 9 we report the geometrical parameters for the zwitterionic glycine radical optimized in vacuum and in aqueous solution at the B3LYP/6-31G(d,p) level. They are referred to the two zwitterionic conformers drawn in figure 13, the former with Hi(N) and Os atoms in an anti orientation and the latter with these atoms nearly eclipsed and engaged in an intramolecular hydrogen bond. As often found for zwitterionic species [136,137], these conformers do not correspond to energy minima in the gas-phase. On the other hand, the zwitterion is stabilized by environmental effects in aqueous solution, where conformer 2 becomes a local minimum according to PCM computations.
515
Table 9. Geometric parameters (A and degrees) for the zwitteronic glycine radical calculated at the B3LYP/6-3 l+G(d,p) level. conformer 1 conformer 2 gas phase aqueous aqueous solution solution bond lenghts Ca-N 1.486 1.458 1.451 Ca-C ' 1.500 1.481 1.482 C'-Os 1.271 1.268 1.272 C'-Oa 1.245 1.264 1.262 N-H1 1.023 1.026 1.037 N-Ha 1.031 1.029 1.029 bond angles N- Ca-C ' 110.87 118.13 115.91 Ca -C'-Os 111.44 116.11 115.11 Ca -C'-Oa 115.78 115.80 116.76 H1-N- Ca 115.19 111.62 106.03 H2-N- Ca 107.31 111.81 112.83 dihedral angles H2-N-Ca-C' 54.42 59.00 119.68 In vacuo, the constrained zwitterion is less stable than the neutral form by 211 kJ/mol at the B3LYP/EPR-II level. Solvent effects strongly reduce the energy gap between protomeric forms, and conformer 2 becomes less stable than the neutral form by only 102.5 kJ/mol at the same level. However, contrary to the parent amino acid, the zwitterionic form of the radical never predominates in aqueous solution, in agreement with the experimental results [135-130]. In Table 10 the isotropic hcc's computed for both conformers in vacuo and in aqueous solution are compared with the experimental results, obtained recording the EPR spectrum of the glycine radical in the solid state [138].
Table 10. Hcc's (G) calculated for the zwitterionic glycine radical at the B3LYPfEPR-II level. conformer 1 conformer 1 conformer 2 in the gas phase in aq. soln. in aq. soln. a(N) -3.1 -2.9 -3.2 a(Ca) 38.2 35.1 34.0 a(C') -11.2 -12.7 -13.1 a(Ha) -23.7 -22.8 -21.5 1/3(HI+Ha) 17.9 18.9 18.6
expt. -3.5 45.0 -23.6 16.6
516
Since the experimental EPR spectrum shows unambiguously that in the solid state the three H(N) atoms are equivalent (probably due to tunneling) [138], only average values of hcc's on H(N) atoms are considered. Note that the introduction of environmental effects does not modify the results obtained for the isolated radical, which are already in fair agreement with experiment. Thus, the magnetic properties of glycine radical in this zwitterionic form are scarcely affected by the crystalline environment, and the EPR spectra for this system are well reproduced by computations for the isolated radical. In particular, the hcc of the a-hydrogen atom has a value close to that of typical aliphatic n-radicals. In Table 11 we report the most significant geometric parameters corresponding to the energy minima found at the B3LYP level for both the neutral and the anionic form of the glycine radical both in the gas phase and in aqueous solution (DPCM calculations).
Table 11. Geometric parameters (A, and degrees) for neutral and anionic glycine radical calculated at different levels. Anionic radical Neutral radical ~as phase (a) aqueous solution(t'~ gas phase(a) aqueous solution(b) bond lenghts Ca-N 1.362 1.356 1.401 1.375 Ca-C' 1.431 1.427 1.487 1.457 C'-Os 1.232 1.239 1.278 1.283 C'-Oa 1.366 1.360 1.268 1.283 N-H1 1.012 1.013 1.024 1.017 N-H2 1.007 1.010 1.018 1.015 bond angles N- Ca -C' 117.47 118.40 118.39 121.17 Ca -C' -Os 124.06 123.94 114.96 117.54 Ca -C'-Oa 113.53 113.50 116.90 117.74 H1-N- Ca 115.49 117.56 106.98 115.03 H2-N- Ca 120.53 120.90 116.06 119.20 dihedral angles H a - C a -C'-Os 172.55 178.14 H1-N- Ca-C ' 12.79 15.20 H2-N- Ca-C' 139.50 157.71 (a)B3LYP/6-31G(d,p) and B3LYP/6-3 l+G(d,p) calculations for the neutral and anionic form, respectively; (b) B3LYP/6-31G(d,p)/DPCM and B3LYP/6-31+G(d,p)/DPCM calculations for the neutral and anionic form, respectively. From a structural point of view the most important feature is that, contrary to the parent amino acids, energy minima are found only for planar or nearly planar arrangements of the whole molecule except for the two aminic hydrogen atoms.
517
As a matter of fact the sum of valence angles around N is 326 ~ in the neutral glycine and the 353 ~ in the corresponding radical. This trend is due to the replacement of the sp 3 C a a t o m of glycine by a nearly sp 2 radical center, and to the synergic action of the electron-withdrawing (capto) carboxyl group and of the electron-donating (dative) aminic moiety in a freebase form which allow an effective electron delocalization and an increased stability of the radical. This induces, in turn, a strong resistance to any deformation destroying the planarity of the molecular backbone, which counterbalances the increased strength of intramolecular hydrogen bridges characterizing nonplanar conformations [139]. This so-called captodative effect [140] is lost when the amino group is protonated, thus strongly modifying the characteristics of protomeric radical species with respect to the parent amino acids. In both cases, the presence of the solvent shortens both the N-C a and Ca-C ' bonds, whereas the C'-O bonds become slightly longer, corresponding to an increased weight of ionic resonance structures in polar media; moreover, the length of the intramolecular hydrogen bond is markedly increased in aqueous solution (from 2.371 to 2.426 A). However, the most significant result is that in aqueous solution the equilibrium structure is closer to planarity, the effect being particularly significant for H2 in the anionic molecule, whose torsional angle is increased from 139.5 ~ to 157.7 ~. The hcc's obtained at the B3LYP/EPR-II level are shown in Table 12. The calculated hcc's can be dissected into three terms: a contribution due to the electronic and structural configurations assumed by the radicals in the gas phase (first column in Table 12); a contribution due to the solvent-induced polarization on the solute wave function without allowing any relaxation of the gas-phase geometry (direct solvent effect, second column in Table 12), and a last contribution due to the solvent-induced geometry relaxation (indirect solvent effect, third column in Table 12). It is noteworthy that the calculated a(H ~) for the neutral form is markedly improved when solvent induced polarization is taken into the proper account. On the other hand, indirect solvent effects are particularly important for the aminic hydrogen hcc' s, which become much more similar to each other. The enhanced agreement with experiment obtained for structures closer to the planarity (as those obtained in solvent) suggests that inversion motions at the radical center and at the aminic moiety may affect the EPR parameters significantly. A simple but effective treatment of this effect is obtained computing the vibrational states supported by an effective large amplitude coordinate (LAC) joining the minimum and the planar structure, which is the transition state of a single outof-plane motion involving all the three hydrogens. Then the hcc's can be
518
calculated for a number of structures along this path, and averaged by a numerical procedure.
Table 12. Hcc's (G). for the neutral and anionic glycine radical calculated at different levels. Neutral Form B3LYP/EPR-II(a) B3LYP/EPR-II/DPCM~a) B3LYP/EPR-U/DPCM(b) Expt. ~e) a(N) 6.06 6.40 5.49 6.38 a(Ca) 12.28 9.23 8.38 a(C') -8.99 -6.75 -6.58 a(Os) -3.79 -3.68 -3.81 a(Oa) -0.10 -0.72 -0.77 a(Ha) - 14.71 - 12.91 - 12.85 11.77 a(H1) -5.31 -5.89 -6.71 5.59 a(H2) -2.04 -2.89 -6.05 5.59 " Anionic Form . . . . . . . . . B3LYP/EPR-I~a) B3LYP/EPR_IUDPCM~a) B3LYP/EPR-II/DP~ b) Expt. (e) a(N) 7.58 8.14 6.14 6.1 a(Ca) 24.14 19.74 14.85 a(C') -10.97 -9.99 -9.30 a(Os) -0.62 -0.78 -1.10 a(Oa) -2.13 -2.37 -3.02 a(Ha) - 17.08 - 16.33 - 15.64 13.8 a(H1) -1.06 -1.56 -3.12 3.4 a(H2) 20.63 18.75 -0.63 2.9 (a)geometry optimized in gas-phase; (b) geometry optimized in aqueous solution; (c)absolute values. .
.
.
.
The energy profiles and the vibrational wave functions calculated in vacuo and in aqueous solution for both the neutral and the anionic form are shown in figure 14, while the dependence of Ha and of the aminic hydrogens hcc's on the effective large amplitude coordinate is drawn in figure 15. In Table 13 we list the averaged hcc's values at 0 K (i.e. considering the vibrational ground state only) and at 298 K (i.e. considering the Boltzmann population of the ten lowest vibrational states). The hcc calculated for H a is affected by the vibrational averaging slightly and to the same extent in vacuo and in aqueous solution, thus confirming that the solvent effect on this parameter is eminently due to direct polarization. On the contrary, the hcc's of aminic hydrogens are sensibly affected by the out-of-plane motion. In the case of the neutral molecule this effect is operative both in the gas phase and in aqueous solution, being however more pronounced in solution (see Table 12 and
13).
-
-Vacuum
--- Water
-2.0
-2.5
-2.0
-1.5
-1.0
-0.5
0.0
0.5
LAC (amul" bohr)
1.0
1.5
-1.5
2.0
-1.0
-0.5 0.0 0.5 1.0 LAC (amuln bohr)
1.5
2.0
2.5 -2.5
-2.0
-1.5
-1.0
-0.5
0.0
0.5
1.0
1.5
2.0
2.5
LAC (amuln bohr)
Figure 14. Energy potential and vibrational states for inversion of the neutral glycine radical (a) and of the anionic glycine radical in vacuum (b) and in aqueous solution (c).
520
Table 13. Averaged hcc's values (G) at 0 and 298 K for the neutral and anionic glycine radicals calculated at B3LYP/EPR-II level. Neutral radical in vacuo in aq. soln. exptl. 0K 278K 0K 278K 5.25 5.81 5.39 5.67 6.38 12.00 12.23 8.35 8.49 -14.64 -14.72 -12.84 -12.91 11.77 - 6 . 1 8 - 5 . 5 9 -6.84 - 6 . 4 0 5.59 - 3 . 9 0 - 2 . 4 2 -6.20 - 5 . 5 2 5.59 Anionic radical in vacuo in aq. soln. exptl. 0K 278K 0K 278K 7.34 7.28 5.52 6.02 6.1 24.21 2 4 . 1 9 14.99 15.26 -17.41 -17.15 -15.70 -15.82 13.8 - 1 . 4 2 - 1 . 4 7 - 4 . 5 3 -3.61 3.4 20.51 20.25 -2.46 -0.86 2.9 On the other hand, we observe strong indirect solvent effects in the case of the anionic radical. This trend is reasonable since in aqueous solution both the neutral and the anionic forms of the radical are characterized by an inversion barrier lower than the zero-point energy for the out-of-plane vibration. The ground vibrational wave function is peaked at the planar structure and is significantly delocalized, inducing remarkable averaging effects and values of the coupling constants close to those obtained in a static description for a quasiplanar structure. While the neutral radical already shows this behaviour in the gas-phase, the anion radical is characterized by a significant inversion barrier and can be effectively treated as a system governed by a single-wen potential unsymmetrically rising on both sides of the equilibrium structure. Of course this implies a small vibrational amplitude around the equilibrium structure and negligible averaging effects. Thus, indirect solvent effects are quite small for the neutral radical, whereas they become very important for the anion. Specific solute-solvent interactions, which are not explicitly taken into account in continuum models, can play a significant role especially for charged species. A typical cluster formed by the glycine anion radical with four water molecules is shown in figure 16. The analogous supermolecule obtained for the neutral form is quite similar, except for the much weaker interaction energies involved in the neutral species. The number and the position of the solvent molecules are determined by molecular dynamics simulations performed by the AMBER force field [141].
521
-- Vacuum
Water
"y
10
,.-,,.
0 o ~"0
-10-
-20 -2.0
-1'.~'-,'.o'-o'.5 'olo 'ols ',io ',15 L A C ( a m u 1/2
35-
'
bohr)
21o
-- Vacuum
30 -
\
~
\
25 -
Water
/
/
H2
20 A
8
J:::
15-
lo5-
-
o-5-
-lo-
H~
-15-" -20
i
-2
,
-
i1
,
i
'
0 L A C a m u 1/2 b o h r
i
1
,
i
2
Figure 15. Values of hydrogens hcc's of glycine radical (a) and glycine radical anion (b) along the LAC in vacuo and in solution. The carboxylic groups appear strongly bound to three solvent molecules, while a fourth water molecule is more loosely bound to the H2 aminl'c hydrogen. The hcc's calculated at the B3LYP/EPR-II for the clusters are shown in Table 14. A number of test computations showed that the water molecules have different effects on the various solute atoms, but none of the first sheU solvent molecules can be neglected if quantitative results are sought. It is also noteworthy that the spin transfer between the free radical and the solvent is very limited, so that the effect of specific interactions is rather to enhance the solute polarization through strong H-bonds with the molecules of the first solvation shell. The role of specific (first solvation shell) and bulk (continuum)
522
contributions to hcc shifts is comparable, so that neither the supermolecule nor a pure continuum model provide converged results.
Table 14. Hcc's (G) for clusters of neutral and anionic glycine radicals respectively with four water molecules calculated at the B3LYP/EPR-II level. Anion radical+4 H20 gas phase aqueous solution a(N) 4.19 4.27 a(Ca) 6.17 5.69 a(C') -4.78 -4.49 a(Os) -3.72 -3.65 a(Oa) -1.35 -1.45 a(Ha) - 11.81 - 11.57 a(H1) -8.94 -9.07 a(HT) -9.42 -9.53 Neutral radical+4H20 gas phase aqueous solution a(N) 3.01 3.92 a(C~) 18.05 12.76 a(C') -9.31 -7.14 a(Os) -0.82 - 1.12 a(Oa) -2.63 -3.01 a(Ha) - 16.69 - 14.57 a(H1) -7.44 -8.50 a(H2) -7.89 -9.05 To proceed further, we recall that hcc's can be decomposed in two contributions: a delocalization term, which is always positive (or null), and a spin polarization, or indirect contribution, which is positive at the radical center and generally negative for hydrogen atoms. In the planar structure, the delocalization effect vanishes since the singly occupied molecular orbital (SOMO) is a pure n orbital whose nodal plane coincides with the molecular plane (see figure 17). Spin polarization, which is responsible for the strongly negative hcc of Ha, is roughly proportional to the n spin density at C a, so that delocalization of the SOMO reduces the hcc of I-Ia. Then figure 17 clearly explains why the absolute value of the H a hcc found for the anion radical is slightly higher than in the neutral radical and why both values are significantly lower than in the zwitterionic form. Furthermore, the Ha hcc in the zwitterionic radical is close to the reference value of the methyl radical for which delocalization is, of course, impossible. Solvent effects enhance the delocalization along the backbone due to an increased weight of ionic resonance forms that involve double N-C a or Ca-C ' bonds with the consequent reduction
523
of the Ha hcc and of the pyramidality of the aminic moiety. Since spin polarization effects are nearly constant for H1 and H2, their hcc's are modified essentially by variations of the delocalization effect, which can be obtained by displacement of any atom above or below the nodal plane of the SOMO. Figure 17 illustrates well this trend, which is quite similar in vacuo and in aqueous solution.
Figure 16. Glycine radical anion with four water molecules. In conclusion, solvent effects lead from a strongly pyramidal to a quasiplanar species with a dramatic variation of the direct contribution to the H2 hcc. At the same time, delocalizafion of the SOMO is increased with the consequent reduction of the n spin density on C a , which, in turn, leads to a smaller spin polarization on H a . Finally it is important to remark that the different contributions to the hcc's considered here (internal motions, specific and bulk solute-solvent interactions effects both on electronic and nuclear configuration) are significantly different for the neutral and charged species. In particular, vibrational and bulk solvent contributions are markedly more important for the anion radical. This is due to the greater flexibility (related to a less effective conjugation) and to the larger extension of the electronic cloud for the charged species. Thus, none of the different contributions can be neglected for a reliable description: this points out,
524
once again, the need for a comprehensive and reliable computational tool for the study of unstable species in solution.
Figure 17. SOMO of glycine radicals in the zwitterionic, neutral and anionic forms obtained at the B3LYP/EPR-I//DPCM level.
5.2.2 5, 6-dihydro-6-thymyl and 5, 6-dihydro-5-thymyl radicals. Pyrimidine and purine bases are preferential DNA targets for free radicalmediated damage. In the case of thymine, H atoms add preferentially to the C5C6 bond and the resulting radicals can be fin'ther transformed, yielding different stable products with a saturated C5-C6 bond. Among these H adduts, the 5,6dihydro-5-thymyl radical (referred to as 5-yl, see figure 18) [142], and the 5,6dihydro-6-thymyl radical (referred to as 6-yl, see figure 18) represent key intermediates leading to the formation of thymine lesions such as 5,6dihydrothymine [ 143]. In the case of the 5-yl radical different experimental conditions lead to a wide range of hyperfine coupling constants for the hydrogen atoms near the radical center (H ~ atoms) [ 144-147]. On the other hand, the few data available for the 6yl radical have not yet allowed an unambiguous assignment of the recorded hyperfine coupling constants to Ha and H ~ atoms [148,149]. Energy minima found at several levels of computation for the radicals correspond to quite similar bond lenghts and valence angles, with a maximum difference of 0.025 A for interatomic distances and 1.5 ~ for angles [150]. However, important modifications of the dihedral angles are observed depending on the method used and on the system studied (see Table 15). In the case of 6-yl radical energy minima always correspond to an approximately haft-chair structure of the six membered ring, with the methyl group and the H ~ hydrogen atom occupying an equatorial and an axial position, respectively. It is noteworthy that these conformational features are identical to those found in 5,6-dihydrothymine [151]: since the formation of the 6-yl radical corresponds to the initial step of the reaction leading to the formation of 5,6dihydrothymine, the conformational features of the product could be already decided in the first reaction step.
525
On the other hand, some geometrical parameters of the 5-yl radical are quite dependent on the computational model. In particular, LSD calculations lead to a planar ring structure, whereas B3LYP and MP2 computations lead to significantly pyramidal environments for both N1 and C5 atoms. Planar and half-chair ring structures produce different situations for the two hydrogens bonded to the C6 carbon atom (H ~1 and H~2), which are equivalent only in the former case. The same structural modifications have a significant impact also on the computed values of electronic properties, like the isotropic hyperfine couplings, which are very sensitive to geometry modifications at, or nearby to, the radical center [152]. Table 15. Geometrical parameters (/k and degrees) for the 5-yl and 6-yl radicals calculated at different levels of calculations. UHF R O H F MP2 B3LYP LDA 6-yl
C5-C4-N3-C2 H~-C5-C4-N3 CMe-C5-C4-N3 C6-N 1-C2-N3
3.8 92.2 -150.8 11.9
3.6 91.5 -151.3 12.8
4.1 91.3 -151.9 9.8
5.5 4.8 95.6 96.1 -149.2 -149.4 4.8 3.7
5-yl C5-C4-N3-C2 C5Me-C5-C4-N3 C6-N1-C2-N3 H~I-C6-N1--C2 H~2-C6-N1-C2
12.1 177.6 -17.6 -89.8 152.7
10.0 13.7 1 7 6 . 2 178.2 -17.5 -21.3 -89.5 -82.1 1 5 3 . 0 160.1
8.4 179.3 -13.4 -98.8 144.8
2.2 -177.5 3.9 -123.3 121.1
Starting from the above observations, it is important to consider the energy profile for the inversion of the half-chair structure for the two compounds (figure 19). The anharmonic vibrational levels supported by both potentials are shown too. The potential energy curve governing the inversion of the 5-yl radical shows two equivalent energy minima corresponding to the enantiomeric forms of the half-chair structure and separated by a planar transition state (TS). However, proper consideration of the inversion motion leads to an average planar geometry since the ground vibrational level is located above the potential barrier (of 0.3 kJ/mol) and is peaked at the planar structure. Also the inversion potential of the 6-yl radical is characterized by two energy minima, but they are no longer equivalent. The most stable structure (referred to as 6-yl-I in figure 18) is characterized by an equatorial methyl group and an axyal H ~ hydrogen, whereas the second energy minimum (less stable by 1.6
526
kJ/mol and referred to as 6-yl-II in figure 18) has an axial methyl group and an equatorial H ~ hydrogen. The potential energy barrier separating the two minima is 2.9 kJ/mol from 6-yl-I to 6-yl-II, the TS corresponding again to a planar ring.
Figure 18. Structures and atom labeling for the 5yl, 6yl-I and 6-yl-II radicals. Thus the potential curve of 6-yl radical corresponds to an asymmetric doublewell, and the first vibrational levels are localized inside the potential wells. However, the small energy barrier between conformers could be significantly affected by either crystal constraints or environmental effects. As a consequence, complete equilibration between the two energy minima of 6-yl radical can occur or not, depending on the temperature and on the origin of the radical. A large body of EPR experimental data is available for the 6-yl and, especially, for the 5-yl radical. The experimental isotropic hcc of ~ and H ~ found in the literature are reported in Table 16, while computations performed at the B3LYP/EPR-II level lead to the values collected in Table 17. Concerning the 6-yl radical, two sets of experimental data are reported: the first one refers to the radical generated by irradiation of a crystal of 5,6dihydrothymine at 77 K [148], while the second set is obtained upon irradiation of a crystal of thymine [149]. In the former case the splitting of the ~-hydrogen is very large (44.0 G), suggesting that the Cs-H bond is nearly perpendicular to the molecular plane.
527
8.
~.6.
gu.i 4, <1
2.
0
-8
-4
0
4
8
-3
LAC (amu 1/2 bohr)
-2
-1
0
1
2
3
4
LAC (amu lm bohr)
Figure 19. Energy profiles and vibrational levels along the LAC for the 5-yl and 6-yl radicals.
Table 16. Experimental hcc's (G) for 5.-Yl and.6-Yl radicals. 6-yl radical 5-yl radical ref. 148 ref. 149 ref. 144 ref. 145 ref. 146
ref. 147
H a H ~ H a H ~ H ~1 H~2 H ~1 Hfl2 H ~1 Hfl2 H ~1 H~2 17.4 44.0 28.3 17.1 34.5 34.5 37.7 37.7 39.0 41.0 34.1 43.1 Computations performed on the 6-yl-I radical (axial [3-hydrogen) lead to hyperfine coupling constants with an error lower than 2% for the (x-hydrogen and lower than 9% for the [~-hydrogens with respect to experimental data (see Tables 16 and 17). T a b l e 17. Hcc's (Gauss) of the 5-yl and 6-yl radicals calculated for different structures and averaged by the DiNa procedure. 6-yl radical 6-),1-1 6-yl-II averaged EPR-II EPR-II/PCM EPR-II EPR-IUPCM E PR-II ~a), EPR-II ~u) EPR-II ~c) Hcm weak weak weak weak weak weak weak Ha -17.2 -16.4 -17.6 -18.2 -17.4 -17.5 -18.4 H~ 43.4 43.4 15.6 16.7 43.4 40.5 33.8 5-yl radical averaged Minimum structure Planar structure EPR-II EPR-II EPR II/PCM EPR_II (a) EPR_II(b) EPR_II (e) I-Icm 21.2 21.3 21.6 21.3 21.3 21.3 H ~1 25.4 34.0 35.1 33.0 32.7 32.3 H ~2 40.0 34.0 35.1 33.0 32.7 32.3 (a) Averaged at 0 K; (b) at 77 K; (c) at 298 K.
528
On the other hand, when the same radical is generated from thymine, a much lower hcc is observed for H ~, suggesting that in this case the Cs-H bond is closer to the average plane of the molecule. The hcc calculated for the ~-hydrogen of 6-yl II radical is compatible with the experimental value obtained upon irradiation of the thymine crystal. The corresponding experimental value for the t~-hydrogen is significantly larger than the usual splittings observed in alkyl radicals [153]. Henriksen at al. [149] argued that this may be the consequence of a 'slight change in hybridization' of the carbon atom containing the unpaired electron in a p,~ orbital. Nevertheless, the computed hcc for H a is about 40% smaller than the experimental value. The theoretical results can be explained by considering that Ha adopts an equatorial orientation with respect to the mean molecular plane for both structures of the 6-yl radical: thus the theoretical hcc of H a are similar in both the cases (17-18 G). A vibrational treatment and the corresponding temperature averaging indicate that the experimental hyperfine coupling constants of the 6-yl radical obtained by irradiation of 5,6dihydrothymine crystal [148] are compatible with the vibrationally averaged values, up to 77 K (Table 17). This corresponds to a preferential 6-yl-I conformation, only the ground vibrational level being populated significantly. This situation would correspond to a conformation of the radical trapped by crystal constraints. Due to the Boltzmann averaging at higher temperatures, the contribution of the 6-yl-H conformer to hcc values increases inasmuch as higher vibrational levels are populated. Anyway, for the radical observed upon irradiation of thymine crystals, the experimental value proposed for the Ha hcc appears to be abnormally high, when compared to the usual ct-hydrogen couplings in free radicals ~-[153], and can be accounted for only by a strong pyramidalization of the radical center. By contrast, if one accepts to reverse the experimental attribution, i.e., 17.1 G for a(I-ff) and 28.3 for a(I-l~), the couplings are compatible with the values computed at 298 K (Table 17). In such hypothesis, the radical could be created either in the 6-yl-I or 6-yl-II conformations and would give an tx-hydrogen hcc of about 18 G and a ~hydrogen hcc of 34 G. Furthermore, if in the crystal the energy difference between the two conformers is somewhat lower than our theoretical estimate, a value between 15.6 and 33.8 G for the H ~ hcc would be reasonable. In conclusion, depending on the mode of formation of the 6-yl radical, one can expect either a large ~-hydrogen coupling or a medium one, but never a large t~coupling and a small ~-coupling. Coming to the 5-yl radical, EPR measurements were performed on different samples including thimine, thymidine, and DNA fragments. The average value of the isotropic hcc of methyl hydrogens is near 20 G in all the reported studies [144-147]. However, different hcc's were observed for the two ~-hydrogens
529
(H~1 and H ~2, see Table 16). In some cases [144,145] both hydrogens are equivalent and la(H#)l= 34.5, 37.7, or 37.5 G. In other cases [146,147], namely when either frozen solution or crystal samples spectra are recorded, the two hydrogens are not equivalent, suggesting that under these conditions specific interactions can induce structural constraints which favor an half chair conformation of the pyrimidine cycle. The ground state of the 5-yl radical is characterized by two nonequivalent ~hydrogens (a(H~l)=25.4 G and a(H~2) =40.0 G ). While the value of aH~2 is in good agreement with experimental results, the hcc of the other ~-hydrogen is underestimated by about 30%. On the other hand, both hydrogens become equivalent in a planar structure, leading to a single hcc value. If we consider the first group of experimental studies giving rise to an hcc of 34.5 G for both ~hydrogens, the difference between theoretical and experimental results is 4% and is reduced to 2% when the DPCM is used to take into account solvent effects. The agreement is worse when the second set of experimental hcc values is considered (EPR-II: 9%, EPR-IFDPCM: 6%). Figure 20b shows the evolution of hyperfine splittings connected to the out-of-plane displacement of the C6 atom along the inversion path. Starting from the ground state structure where the two hydrogens of the methylene group are not equivalent, the vibrational treatment leads to identical coupling constants for both ~l-hydrogens at 0, 77, and 298 K (Table 17). Taking into account that the computed hcc's of the equilibrium structure are increased by about 1 G when going from vacuo to aqueous solution, our best estimate is 33.4 G. This result involves a maximum error of 10% between theory and the different experimental values. From these considerations, it is quite evident that the EPR spectrum of the 5,6-dihydro-5thymyl radical should correspond to an effectively planar species with two equivalent ~-hydrogens. Experiments that lead to nonequivalent protons cannot be explained by our theoretical treatment. This can be understood by the fact that the exact experimental conditions were not taken into account by the present model.
5.2.3 Pyrrolidine-l-oxyl and imidazoline-l-oxyl radicals. Due to their stability [154] and to the localization of the unpaired electron in the NO moiety [155], nitroxides are undoubtedly the organic free radicals most used as 'spin labels' and 'spin probes' [156], for the study of the structure and the dynamics of microheterogenous biological and macromolecular systems such as micelles [1591, proteins [160], vescicles and lipid bilayers [161]. Since the magnitude of the nitrogen hcc remarkably depends on the features (mainly the polarity) of the nitroxide's environment [162], these molecules are indeed ideal 'probes' of the the medium in which they are embedded, giving valuable information on the polarity, the hydrogen bonding power and the pH of the
530
solvent, on the presence of other free radicals, and so on. Furthermore, the line widths of the EPR spectra (and the effective rotational correlation time "c) are controlled by the rotational and lateral diffusion of the nitroxide and can give valuable information on the viscosity, the orderedness and the temperature of the spin probe environment [163]. This has stimulated in recent years several theoretical studies on nitroxides, either as spin probes [133] and molecular magnets [164]. Pyrrolidine-l-oxyl (hereafter Proxyl), belongs to the class of five membered ring nitroxides, which have been most used as spin probes (see figure 20) [156].
Figure 20. Structure of the proxyl and imidoxyl radicals. Since proxyl radicals unsubstituted at C a atoms are very unstable, the experimental reference for our study is provided by the diffraction data of proxyl radicals bearing bulky substituems [ 157]. Calculations performed at the PBE0/631G(d,p) level on the free molecule show bond lenghts close to the corresponding experimental values and a good agreement also between the calculated and the experimental bond angles. Since the value of a(N) critically depends on the competition between a pyramidal (sp 3 like hybridization) and a planar (sp 2 like hybridization) geometry around the nitrogen atom [158] it is worthy of noting that all the calculated out-of-plane angles are close to their experimental counterparts. The available data for Proxyl include EPR spectra recorded in two different solvents, CH2C12 (a(N)=14.9 G) [166a] and water (a(N)=16.7) [166b] (solvent shift +1.8 G). The effect of electronic polarization due to the bulk properties of the solvent can be investigated using spin densities determined in CH2C12 and in H20 by the CPCM method for the geometric structures optimized in vacuo. The corresponding solvent shift of a(N) is just 0.69 G at the PBE0/EPR-II/CPCM level [164], remarkably lower than the experimemal value (1.8 G). This suggests that in aqueous solution it is necessary to take into account the role played by the interactions between the nitroxide
531
moiety and specific solvent molecules. As a matter of fact, spectroscopic observations clearly show that two water molecules are strongly and specifically bound to the nitroxide oxygen [167]. The simultaneous inclusion of the bulk effect and of specific interactions can be modeled considering adducts formed by the nitroxides and two water molecules in the continuum medium: PBE0/EPR-IFCPCM hcc's corresponding to such a structure optimized at the PBE0/6-31G(d,p) level lead to computed solvent shifts in good agreement with the experimental results (1.67 vs. 1.8 G). Imidazoline-l-oxyl (Imidoxyl) and its analogues, other five membered ring nitroxides, possess a protonable site (N3 of figure 20) which exhibits a marked dependence of a(N) on the pH of the embedding medium [165], and are thus very suitable for accurate pH measurement in cellular membranes, vescicles and miceUes as well as for the determination of pH gradients across lipid membranes [165]. EPR spin probing by nitroxides has the advantage of high sensitivity (very low concentrations of probe are necessary), without the requirement of optical transparency associated to the traditional optical probes. The PBE0/EPR-II a(N) of imidoxyl together with the experimental pK of the tetramethylated analogues have been used to build the titration curve shown in Figure 21. The behavior of the curve is similar to that experimentally determined for other protonable five membered cyclic nitroxides: the plot is sigmoidal, with a sudden drop of the value of a(N) near the experimental pK. Therefore nitroxides of this class could provide effective intracellular pH indicators and the calculations are able to effectively reproduce also this important environmental effect. This is a rather interesting result, since a reliable calculation of a(N) vs. pH could allow the prediction of the dependence of the hcc on the pH also for unstable radicals, not easily accessible by the experiments. Finally, it is worth of noting that if the a(N) are calculated using geometries optimized in vacuo the relative order between the values of the protonated and of the neutral imidoxyl is opposite to that predicted by the experiments for their tetramethyl derivatives [165b]. This result strongly underlines the importance of performing geometry optimizations in solution, if one is interested in the comparison between magnetic properties of different species.
532
10.55 10.50 Nitrogen hcc (G)
. 4 ~ . IP--"a
10.45
jr )r
10.40 10.35
j'
10.30 10.25
ii
9 w ..-I~"
0.0
0.5
1.0 1.5 2.0
2.5
3.0
pH
Figure 21. Nitrogen hcc's of imidoxylradical in aqueous solution as a function of pH.
CONCLUDING REMARKS The first part of the present chapter sketches a general picture of the building blocks (involving both electronic and nuclear degrees of freedom) needed to describe elementary processes involving biomolecular species in condensed phases. The second part of the chapter is devoted to a number of applications, selected with illustrative purposes rather than with a view to provide a comprehensive compilation of results. The first point worth mentioning is that, though open to further developments, density functional theory provides us hic et n u n c with simple, direct and quantitative tools allowing the theoretical study of elementary processes involving biomolecular systems. Due to the reliability of results, the computational speed, and the availability of analytical first and second derivatives (both in vacuo and in solution) this approach allows the characterization of the most significant parts of complex potential energy surfaces. Furthermore, its single determinant nature makes the description clear and easily interpretable. The second aspect is that the field of application of this tool can be further extended using the general phylosophy of system-bath decompositions. The system includes the region where the essential of the process to be investigated is localized plus, possibly, the few solvent molecules strongly (and specifically) interacting with it. This part is treated at the electronic level of resolution and is immersed in a polarizable continuum, mimicking the macroscopic properties of the solvent. Recent implementations of these methods in different quantum mechanical frameworks allow us to investigate both the modifications of the structure and the physico-chemical properties of large solutes induced by interactions with the environment. The mechanic and electrostatic effect of more
533
distant parts of the solute can be taken into account at the molecular mechanical level, leading to very effective QM/MM/PCM methods. These approaches are at the heart of dynamical models of physico-chemical properties and reactivity, which have been o n l y shortly sketched in this contribution and deserve, in our opinion, increased attention and technical developments. In conclusion, the most significant outcome of this and related studies is that a quantitative description of reactive processes of biomolecules in solution is becoming more and more feasible thanks to the development of new powerful models and to their effective implementation in user friendly integrated packages which are, or will be shortly be, available also to non specialists.
ACKNOWLEDGEMENTS It is a pleasure to thank dr. Roberto Improta and dr. Giovanni Scalmani for computational aid and careful reading of the manuscript. The financial support of the Italian Research Council (CNR) is also gratefully acknowledged.
REFERENCES
1. 2.
3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17.
R.G. Parr and W. Yang, Density Functional Theory of Atoms and Molecules (Oxford University Press, New York, 1989). Density Functionals: Theory and Applications, vol.500 of Lecture Notes in Physics, edited by D.P.Joubert (Springer Verlag, Berlin, 1998). T.Ziegler,Chem.Rev., 91 (1991) 651. P.Hohenbergand W.Kohn, Phys.Rev.B, 136 (1964) 864. W.Kohnand L.J.Sham, Phys.Rev.A, 140 (1965) 1133. P.M.W.Gill,B.G.Johnson and J.A.Pople, Chem.Phys.Lett., 209 (1993) 506. (a) C.A.White and M.Head-Gordon,J.Chem.Phys., 101 (1994) 6593;(b) M.C.Strain, G.E. Scuseria and M.J.Friseh, Science, 271 (1996) 51. J.M.Millamand G.E.Scuseria, J.Chern.Phys., 106 (1997) 5569; 107 (1997) 425. J.M.Fosterand F.Weinhold, J.Arn.Chero_Soc., 102 (1980) 7211. J.Cioslowski, J.Am.Chero.Soc., 111 (1989)8333. R.F.Bader, Atoms in Molecules. A Quantum Theory, Oxford University Press: Oxford, U.K. (1990). A.D. Becke, J. Chem. Phys., 85 (1986) 7184. J.P. Perdew and Y. Wang, Phys. Rev., B 33 (1986) 8800. A.D. Becke, Phys. Rev. B, 38 (1988) 3098. J.P. Perdew, in Proceeding of the 21st Annual International Symposium on the Electronic Structure of Solids, edited by P. Ziesche and H. Eschrig (Akademie Verlag, Berlin 1991) P.M.W. Gill, Mol. Phys. 89 (1996) 433. E.I.Proynov and D.R.Salahub,Phys.Rev.B, 49 (1994) 7874.
534
18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 57.
58. 59. 60.
F.A.Han~recht,A.J.Cohen,D.J.Trozer and N.C.Handy, J.Chem.Phys., 109 (1998) 6264. C.Lee,W.Yang and R.G.Parr, Phys.Rev.B, 37 (1988) 785. A.D.Becke, J.Chem.Phys. 107 (1997) 8554; 109 (1998) 2092. R.A. Adamson, P.M.W. Gill and J.A. Pople Chem. Phys. Lett. 284 (1998) 6 C. Adamo and V. Barone, J. Chem. Phys. 110 (1999) 6158. C. Adamo, V. Barone and G.E. Scuseria J. Chem. Phys. 111 (1999) 2889 C. Adamo, M. Cossi, G. Scalmani and V. Barone, Chem. Phys. Lett. 307 (1999) 265. C. Adamo and V. Barone, Chem. Phys. Lett. 298 (1998) 242. Y. Zhang, W. Pan and W. Yang, J. Chem. Phys. 107 (1997) 7921. D.C. Patton, D.V. Porezag and M.R. Pederson, Phys. Rev. B 55 (1997) 7454. J.P. Perdew, K. Burke and M. Ernzerhof, Phys. Rev. Lett. 77 (1996) 3865, ibid 78 (1997) 1396 (E). Y. Zhang and W. Yang, Phys. Rev. Lett. 80 (1998) 890. S. Kurth, J.P. Perdew and P. Blaha, Phys. Rev. B, in press B. Hammer, L.B. Hansen and J.K. Ncrskov, Phys. Rev. B 59 (1999) 7413. T. Tsuneda, T. Suzumura and IC Hirao, J. Chem. Phys. 111 (1999) 5656. J.P. Perdew, M. Ernzerhof, A. Zupan and K. Burke, J. Chem. Phys. 108 (1998) 1522 A. Zupan, IC Burke, M. Ernzerhof and J.P. Perdew, J. Chem. Phys. 106 (1997) 10184 E.H. Lieb and S. Oxford, Int. J Quantum Chem. 19 (1981) 427. J.P. Perdew and Y. Wang, Phys. Rev. B 45 (1992) 13244. A.D. Becke, J. Chem. Phys. 84 (1986) 4524. C. Adamo and V. Barone, J. Chem. Phys. 108 (1998) 664. L.A.Curtiss,K.Raghavachari and J.A.Pople, J.Chem.Phys. 98 (1993) 1293. B.G. Johnson, P.M.W. Gill and J.A. Pople, J. Chem. Phys. 98 (1993) 8765. J.P. Perdew, M. Ernzerhof and IC Burke, Phys. Rev. Lett. 80 (1998) 891. A.D. Becke, J. Chem. Phys. 98 (1993) 1372. S.H.Vosko,L.Wilk and M.Nuisar, Can.J.Phys. 58 (1980) 1200. A.D. Becke, J. Chem. Phys. 104 (1996) 1040. O.Gunnarson and B.I.Lundqvist, Phys.Rev. B 13 (1976) 4274. J.P. Perdew, M. Ernzerhof and IC Burke, J. Chem. Phys. 105 (1996) 9982. C. Adamo and V. Barone, Chem. Phys. Lett. 274 (1997) 242. C. Adamo and V. Barone, J. Comput. Chem. 19 (1998) 418. C. Adamo and V. Barone, Chem. Phys. Lett. 298 (1998) 113. J.M. P6rez-JordL A.D. Becke, Chem. Phys. Lett. 233 (1995) 134. T.A. Wesolowski, O. Parisel, Y. Ellinger and J. Weber, J. Phys. Chem. A 101 (1997) 7818. J.F. Ogilvie and F.Y.H. Wang, J. Mol. Struct. 273 (1992) 277. S. F. Boys and F. Bernardi, Mol. Phys.19 (1970) 553. C. Adamo and V.Barone, Chem. Phys. Lett., 314 (1999) 152. G. Scalmani, J.Bredas and V. Barone, J. Chem. Phys. 112 (2000) 1178. C. Adamo, M.Cossi and V. Barone Theochem 493 (1999) 145. IC Burke, J.P. Perdew and Y. Wang, in Electronic density functional theory: recent progress and new derivations, edited by J.F. Dobson, G. Vignale and M.P. Das (Plenum Press, New York, 1997). D.L. Novikov, A.J. Freeman, N.E. Christensen and A. Svane, Phys. Rev. B 56 (1997) 7206. S. Kurth and J.P. Perdew, Phys. Rev. B, in press. J.P. Perdew, Phys. Rev. Lett. 55 (1985) 1665
535
61. 62. 63. 64. 65. 66. 67. 68.
69. 70. 71. 72. 73. 74. 75. 76. 77. 78. 79. 80. 81. 82. 83. 84. 85. 86. 87. 88. 89. 90. 91. 92. 93. 94.
S.K. Ghosh and R.G. Parr, Phys.Rev. A 34 (1986) 785. J.P. Perdew, S. Kurth, A. Zupan and P. Blaha, Phys. Rev. Lett. 82 (1999) 2544. T. van Voorhis and G.E. Scuseria, J. Chern. Phys. 109, 400 (1998). A.D. Becke, J. Chem. Phys. 109 (1998) 2092 E.I. Proynov, E. Ruiz, A. Vela and D.R. Salahub, Int. J. Quantum Chem_ 29 (1995) 61. P.S. Svendsen and U. von Barth, Phys. Rev. B 54 (1996) 17402. C. Adamo, M. Ernzerhof and G.E. Scuseria, J. Chern. Phys, 112 (2000) 2643. Gaussian 99, Development Version (Revision B.6) M. J. Frisch, G. W. Trucks, H. B. Schlegel, G. E. Scuseria, M. A. Robb, J. R. Cheeseman, V. G. Zakrzewski, J. A. Montgomery, R. E. Stratmann, J. C. Burant, S. Dapprich, J. M. Millam, A. D. Daniels, K. N. Kudin, M. C. Strain, O. Farkas, J. Tomasi, V. Barone, M. Cossi, R. Cammi, B. Mennucci, C. PomeUi, C. Adamo, S. Clifford, J. Ochterski, G. A. Petersson, P. Y. Ayala, Q. Cui, K. Morokuma, D. K. Malick, A. D. Rabuck, K. Raghavachari, J.B. Foresman, J. Cioslowski, J. V. Ortiz, B. B. Stefanov, G. Liu, A. Liashenko, P. Piskorz, I. Komaromi, R. Gomperts, R. L. Martin, D. J. Fox, T. Keith, M. A. A1-Lahaln, C. Y. Peng, A. Nanayakkara, C. Gonzalez, M. Challacombe, P. M. W. Gill, B. Johnson, W. Chen, M. W. Wong, J. L. Andres, M. Head-Gordon, E. S. Replogle, and J. A. Pople, Gaussian, Inc., Pittsburgh PA, 1999. W. Weltner, Jr., Magnetic Atoms and Molecules, Dover, New York (1989). V. Barone, C. Minichino, A. Grand and R. Subra, J. Chem. Phys. 99 (1993) 6787. E. Hirota and C. Yamada, J. Mol. Spectr. 96, 175 (1985) 175. R.W. Fessenden and R.H. Sahuber, J. Chem. Phys. 39 (1963) 2147. C. Adamo, V. Barone and A. Fortunelli, J. Phys. Chem. 98 (1994) 8648. C. Adamo, V. Barone and A. FortuneUi, J. Chem. Phys. 102 (1995) 384. D. Feller, E.D. Glendening, W.A. McCuUough and R.J. Miller, J. Chem. Phys. 99 (1993) 2829. E. Hirota, C. Yamada and M. Okunishi, J. Chem. Phys. 97 (1992) 2963. D. Feller and E.R. Davidson, J. Chem. Phys. 80 (1984) 4225. V. Barone and C. Adamo Chern. Phys. Lett. 224 (1994) 432, ibid. 228 (1994) 499 (E). H.U. Suter and B. Engels, J. Chero_ Phys. 100 (1994) 2936. L.B. Knight and J. Steadman, J. Chem. Phys. 80 (1984) 1018. V.Barone, Theor.Chim.Acta 91 (1995) 113. J.R. Cheeseman, G.W. Trucks, T.A. Keith and M.J. Frisch, J. Chem. Phys. 104 (1998) 5497. G. Schreckenbach and T.Ziegler, J. Chem. Phys. 99 (1995) 606. G. Rauhut, S. Puyear, K. Wolinski and P.Pulay, J. Phys. Chem. 100 (1996) 6130. V.G. Malkin, O.L. Malkina, M.E. Casida and D.R. Salahub, J. Chem. Phys. 116 (1994) 5898. M. Btihl, Chem. Phys. Lett. 267 (1997) 251. C. Adamo, A.di Matteo and V. Barone, Adv. Quantum Chem. 36 (1999) 45. V. Barone, unpublished results B.J.Krohn, W.C.Ermler, C.W.Kem, J.Chem.Phys. 60 (1974) 22. V. Barone in Recent Advances in Density Functional Methods (Part I), D.P. Chong, Ed. Scientific Publishing Co. (1995) 287. W.H.Miller, N.C.Handy, and J.E.Adams, J.CherlxPhys. 72 (1980) 99. ICMtiller, Angew.Chern.Int.Ed.Engl. 19 (1980) 1. C.Zhixing, The~ 75 (1989) 489. V.Barone and C.Minichino, Theochem 330 (1995) 365.
536
95. V.Barone, to be published. 96. V.Barone,A.Grand,C.Minichino,R.Subra, J.Chem.Phys. 99 (1993) 6787. 97. D.M.Chipman, Theor.Chim.Acta 76,73 (1989); J.Chem.Phys. 54 (1989) 55. 98. J. Tomasi, M. Persico Chem. Rev., 94 (1994) 2027. 99. C.J. Cramer, D.J. Truhlar Chem. Rev., 99 (1999) 2161. 100. J.-L. Rivail, D. Rinaldi in Computational Chemistry: Review of Current Trends, J. Leszczynski ed., New York: World Scientific (1995). 101. S. Miertus, E. Scrocco, J. Tomasi Chem. Phys. 55 (1981) 117. 102. M. Cossi, V. Barone, J. Tomasi Chem. Phys. Lett. 255 (1996) 327. 103. V. Barone, M. Cossi J. Phys. Chem. A 102 (1998) 1995. 104. E. Canc6s, B. Mennucci, J. Tomasi J. Chem. Phys.,107 (1997) 3032. 105. M. Cossi, V. Barone, B. Mennucci, J. Tomasi Chem. Phys. Lett. 286 (1998) 253. 106. M. Cossi, V. Barone J. Chem. Phys. 109 (1998) 6246. 107. C. Amovilli, V. Barone, R. Cammi, E. Canc~s, M. Cossi, B. Mennucci, C.S. PomeUi, J. Tomasi Adv. Quantum Chem. 32 (1998) 227. 108. M. Svensson, S. Humbel, R. Froese, T. Matsubara, S. Sieber, IC Morokuma J. Phys. Chem. 100 (1996) 1200. 109. B. Mennucci, E. Cances, J. Tomasi J. Phys. Chem. A 101 (1998) 10506. 110. B. Mennucci, R. Cammi~ J. Tomasi J. Chem. Phys. 100 (1998) 6858. 111. G. Schuurmann, M. Cossi, V. Barone, J. Tomasi J. Phys. Chem. A 102 (1998)6706. 112. N. Rega, M. Cossi, V. Barone J. Comput. Chem. 20 (1998) 1186. 113. M. Cossi, V. Barone, M. A. Robb J. Chem. Phys. 111 (1999) 5295. 114. Pascual-Ahuir, E. Silla, I. Tunbn J. Comput. Chem. 15 (1994) 1127. 115. C.S. PomeUi, J. Tomasi J. Comput. Chem.,19 (1998) 1758. 116. V. Barone, M. Cossi, J. Tomasi J. Chem. Phys. 107 (1997) 3210. 117. N. Rega, M. Cossi, V. Barone J. Am. Chem. Soc. 119 (1997) 12692; 120 (1998) 5723. 118. I. Pettersson, T. Liljefors, J. Comput. Aided Mol. Design 1 (1987) 143. 119. M. Bengtsson, T. Liljefors, B.S. Hansson, Bioorg. Chem. 15 (1987) 409. 120. F. Gregoire, S.H. Wei, E.W. Streed, K.A. Brameld, D. Fort, L.J. Hanely, W.A. Goddard, J.D. Roberts, J. Am. Chem. Soc. 120 (1998) 7537. 121. R.J. Abrahmn, P. Loftus, A.W. Thomas, Tetrahedron 33 (1977) 1227. 122. P. Aadal Nielsen, P.O. Norrby, T. Liljefors, N. Rega, V. Barone, J. Am. Chem. Soc., in press. 123.P.F. Ford, P. Wang, J. Am. Chem, Soc. 114 (1992) 10563. 124.N. Gresh, A. Pullmarm, P. Claverie, Theor. Chim. Acta 67, (1985) 11. 125.C. Altona, R. Francke, R. de Haan, J.H. Ippel, G.J. Daalmans, A.J.A. Westra Hoekzema, J. van Wij~, Magn. Reson. Chem. 32 (1994) 670. 126. P. Reichard, A. Ehrenberg, Science 221 (1983) 514. 127. C.W. Hoganson, G.T. Babcock, Biochemistry 31 (1992) 11874. 128. V. Volker, A.F. Wagner, M. Frey, F.A. Neugerbauer, J. Knappe, Proc. Natl. Acad. U.S.A. 89 (1992) 996. 129. E. Mulliez, M. Fontecave, J. Gaillard, P. Reichard, J. Biol. Chem. 268 (1993) 2296. 130. P. Neta, R.W. Fessenden, J. Phys. Chem. 75 (1971) 738. 131. E. Hayon, T. Ibata, N.N. Lichtin, M. Simic, J. Am. Chem. Soc. 93 (1971) 5388. 132. R. Livingston, D.G. Doherty, H. Zelders, J. Am. Chem. Soc. 97 (1975) 3198. 133. P. Baglioni, M.F. Ottaviani, G. Martini and E. Ferroni, Surfactants in solution Eds. ICL. Mittal and B.Lindman B (Plenum, New ,York, 1984) vol 1, pag 542. 134. D. Yu, A. Rauk, D.A. Armstrong, J. Am, Chem. Soc. 117 (1995) 1789.
537
135. D.A. Armstrong, A. Rauk, D. Yu, J. Chem. Soc., Perkin Trans. 2 (1995) 553. 136. Y. Ding, K. Krogh-Jespersen, Chem. Phys. Lett., 199 (1992) 261. 137. D. Yu, D.A. Armstrong, A. Rauk, Can. J. Chem., 70, (1992) 1762. 138. D.K. Ghosh, D.H. Wiffen, J. Chem. Soc., (1960) 1869. 139. V. Barone, C. Adamo, F. Lelj, J. Chem. Phys. 102 (1995) 364. 140. C. Easton, Chem. Rev. 97 (1997) 53. 141. V. Barone, C. Adamo, A. Grand, R. Subra, Chem. Phys. Lett, 242 (1995) 351. 142. A.Ehrenberg, L.Ehrenberg, G.L6froth, Nature, London 200 (1963) 376. (b) R.Salovey, R.G.Shulman, W.M.Walsh Jr., J. Chem. Phys. 30 (1963) 839. (c) P.S.Pershan, R.G. Shulman, B.J.Wyluda, J.Eisinger, J. Physics 1 (1964) 163. 143. A.F.Fuciarelli, B.J.Wegher,W.F.Blakely, M.Dizdaroglu, Int.J.Radiat.Biol.58 (1990) 397. 144. M.G.Ormerod, Int. J. Rad. Biol. 9 (1965) 291. 145. (a) R.Salovey, R.G.Shulman, W.M.Walsh, J. Chem. Phys. 39 (1963) 839. (b) P.S. Pershan, R.G.Shulman, B.J.Wyluda, J.Eisinger, Science 148 (1965) 378. 146. V.T.Srinivasan, B.B.Singh, A.R.Gopal-Ayengar, Int. J. Rad. Biol. 15 (1968) 89. 147. (a) R.A.Holroyd, J.W.Glass, Int. J. Rad. Biol. 14 (1968) 445. (b) T.Henriksen, Radiat. Res. 40 (1969) 11. (c) T.Henriksen, W.B.G.Jones, Radiat. Res. 45 (1971) 420. 148. T.Henriksen, W.Snipes, J. Chem. Phys. 52 (1970) 1997. 149. T.Henriksen, W.Snipes, Radiat. Res. 42 (1970) 255. 150. F.Jolibois,J.Cadet,A.Grand,R.Subra,N.Rega, V.Barone,J.Am.Chern.Soc. 120 (1998) 1864. 151. (a) J.Konnert, I.L.Karle, J.Karle, Acta Crystallogr. B26 (1970) 770. (b) J.Cadet, L. Voituriez, F.E.Hruska, S.L.Kan, F.A.A.M de Leeuw, C.Altona, Can.J.Chem.63(1985) 2861. 152. (a) L.A.Eriksson, V.G.Malkin, O.L.Malkina, D.R.Salahub, J.Chem. Phys. 217 (1993) 24. (b) J.Kong, L.A.Eriksson, R.J.Boyd, Chem_.Phys.Lett. 217 (1993) 24. (c) L.A.Eriksson, O.L.Malkina, V.G.Malkin, D.R.Salahub, J.Chem.Phys. 100 (1994) 5066. (d) M.A. Austen, L.A.Eriksson, R.J.Boyd, Can. J. Chem. 72 (1994) 695. 153. R.W.Fessenden, R.H.Schuler, J.Chem. Phys. 39 (1963) 2147. 154.A. Rassat, Pure Appl. Chem. 62 (1990) 223. 155.A. Ricca, J.M. Tronchet, J.Weber and Y. Ellinger, J. Phys. Chem. 96 (1992) 10779. 156. (a) Ed. L.J. Berliner , Spin Labeling: Theory and Applications (Academic Press, New York 1976); (b) O.H. Griffith and A.S. Waggoner, Acc.Chem.Res. 2 (1969)17; (c) T.J. Stone,T.Buckham, P.L.Nordio and H.M.McConneU, Proc.Natl.Acad.Sci. 54 (965) 1010; (d) J.F.W.Keana, Chem.Rev. 78 (1978) 37. 157.B. Chion, J. Lajz~rowicz, Acta Cryst. B24 (1968) 196. 158. H.M. McConnel, D.B. Chesnut, J. Chem. Phys., 28 (1958) 107. 159. (a) B.L. Bales, L. Messina, A. Vidal, M. Peric and O.R. Nascimento, J. Phys. Chem. 102 (1998) 10347; L.E. Almeida, I.E. Borissevitch, V.E. Yushmanov and M. Tabak, J. Coll. Interf. Sci. 203 (1998) 456. 160. Eds. L.J. Berliner and J. Reuben, Biological Magnetic Resonance (Plenum Press, New York London 1989, Vol 8). 161. J.F.W. Keana, M.J. Acarregui and S.L.M. Boyle, J. Am. Chem. Soc. 104 (1982) 827. 162. L.M.D. Buhre, L.A.M. Rupert and J.B.F.N. Engherts, Rec.Trav.Chim_ Pays-Bas 107 (1988) 17. 163. (a) B.R. Knauer and J.J. Napier, J. Am. Chem. Soc. 98 (1976) 4395; (b) A.H. Reddoch and S. Kinishi, J. Chem. Phys. 70 (1979) 2121; (c) T. Abe, S. Tero-Kubota and Y. Ikegami, J. Phys. Chem. 86 (1982) 1358.
538
164. V. Barone, A. Bencini, A. di Matteo, J. Am. Chem. Soc., 119 (1997) 10831; V. Barone, A. Bencini, M. Cossi, A. di Matteo, M. Mattesini, F. Totti, J. Am. Chem. Soc. 120 (1998) 7069; V. Barone, A. di Matteo, F. Mele, I. de P.R. Moreira, F. Illas, Chem. Phys. Lett. 302 (1999) 240; C. Adamo, A. di Matteo, P. Rey, V. Barone, J. Phys. Chem. A 103 (1999) 3481; A. di Matteo, C. Adamo, M. Cossi, P. Rey, V. Barone, Chem. Phys. Lett. 310 (1999) 159; A. di Matteo, V. Barone, J. Phys. Chem. A 103 (1999) 7676. 165. J.F.W. Keana, M.J. Acarregui and S.L.M. Boyle, J. Am. Chem. Soc. 104, 827 (1982);(b) V.V. Khramtsov, L.M. Weiner, I.A. Grigoriev and L.B. Volodarsky, Chem. Phys. Lett. 91 (1982) 69. 166. (a) G. Chapelet-Letourneux, H. Lemaire and A. Rassat, Bull. Soc. Chim. Fr. (1965) 3283; (b) A. Hudson and H.A. Hussain, J. Chem. Soc. B (1968) 251; (c) J.F.W. Keana, T.D. Lee and E.M. Bernard, J. Am. Chem. Soc. 98 (1976) 3052. 167. M.C.R. Symons and A. Pena-Nufiez, J. Chem. Soc. Faraday Trans. 1 81 (1985) 2421.
L.A. Eriksson (Editor)
Theoretical Biochemistry- Processes and Properties of Biological Systems
539
Theoretical and Computational Chemistry, Vol. 9 9 2001 Elsevier Science B.V. All rights reserved
Chapter 13 MODELLING ENZYME
- LIGAND
INTERACTIONS
M. J. Ramos, A. Melo and E. S. Henriques
C E Q U P - Departamento de Qufmica, Faculdade de Ciancias da Universidade do Porto, Rua do Campo Alegre, 687, 4169-007 PORTO, Portugal
1. I N T R O D U C T I O N Most physiological and pharmacological responses are mediated by specific receptor-ligand interactions. These receptors are macromolecules specialised in recognising a specific molecular pattern from the large number of surrounding molecular species with which they could interact, the term 'receptor' being used to designate pharmacological receptors, enzymes, antibodies and DNA [1]. Ligands may range from small organic molecules to large biomolecules such as proteins or carbohydrates. The pharmaceutical industry aims at the design of molecules that can mimic or disrupt the association of biological molecules. Computational approaches are now widely implemented in ligand design and ranking of the potential new drugs, the goal being to select only the most promising candidates for synthesis and experimental characterisation. Considering that only about 0.01% of the screened compounds may actually be commercialised [1], such approaches largely contribute to reduce the pharmaceutical research costs in both time and money. Empirical techniques have been widely used to correlate the properties of a series of ligands with their biological activities, searching for 2D and 3D similarity on large databases in order to define either pharmacophore or receptor models and derive quantitative structure-activity relationships (QSAR) [2]. Such pharmacophore/receptor mapping techniques might prove particularly useful when no receptor structure is known, considering the underlying idea that a good ligand must be a 'complementary image' to its target receptor's binding site. Thus a pharmacophore - a map of the essential molecular characteristics a ligand must possess for high affinity binding [3] - allows a rapid analysis of which compounds might become potential ligands to a certain bioregulator. A
540
receptor model differs from a pharmacophore in that it attempts to map the important features of the active site itself, based solely on ligand information[4]. A step forward is the modelling of pseudoreceptors [5], which are threedimensional (full atomistic) receptor surrogates built 'around' the structures of known ligands. With the increasing availability of three-dimensional structures of potential therapeutic targets, either from diffraction crystallography [6, 7], NMR spectroscopy [8-10], or homology-modelling predictions, structure-based fitting methodologies have been developing, making structure-based drug design (SBDD) an alternative approach for ligand ranking/modelling. More recently, the possibility of combining the information on the activities of a family of compounds with the knowledge of its target molecule 3D-structure, encourages the development of new combinatorial strategies for receptor-ligand interactions modelling [11, 12]. Our attention will be essentially focused on the modelling of enzymeligand interactions through SBDD methods, although we will not be concerned with the problem of whether or not the ligand is potentially a drug. Since much structural information is available for enzymes (even for enzyme-inhibitor complexes), it is no surprise that most reported uses of these methods involve them as target receptors. SBDD attempts to rationalise the binding affinities by making use of various computational techniques ranging from those employing simple scoring functions to the detailed free energy simulation methods, with molecular mechanics (MM) minimisations, conventional molecular dynamics (MD) and Monte Carlo (MC) simulations, and empirical AG predictions in between. In the past few years, the advent of hybrid methodologies (QM/CM) introduced quantum mechanics (QM) combined with classical mechanics (CM) in the process. The present chapter is not an attempt to treat this subject from a detailed theoretical perspective. The emphasis here is on introducing the reader to some of the available existing options to study enzyme-ligand interactions successfully. Accordingly, we will try to explain why one would want to use the qualitative and quantitative models focused in those studies. Recent literature examples will be mentioned to show how suitable the methodologies are for a given purpose.
2. STRATEGIES IN ENZYME-LIGAND DESIGN The major aspects that determine a biomolecular recognition event, all of them involving changes in both entropy and enthalpy, are: (i) the structural and energetic complementarity between the ligand and the receptor, (ii) the conformational rearrangements both structures undergo upon complexation, (iii)
541
their desolvation, and (iv) the loss of rotational and translational freedom of the ligand when it binds to the active site [13]. The difficulty lies in quantitate and scale the relative contributions of those aspects to the free energy of binding (hence, predicting the affinity). Rigorous methods such as the free energy simulation ones, in which one attempts to accurately calculate the relative free energy of binding of closely related ligands using extensive molecular dynamic simulations [14], are too time-consuming for the initial screening of potential new inhibitors that will have to compete effectively with the enzyme natural substrate. Instead, a number of programs are now available to suggest and rank new ideas, by rapid analysis of the proteins active site followed by database search and fragment /whole molecule docking. To different extents, these programs attempt to empirically translate the crucial contributions for ligand binding into constraints, and assess the quality of the possible solutions using scoring functions. Enzyme-ligand interactions are usually non covalent, mainly electrostatic (ionic and hydrogen bonding), van der Waals and hydrophobic interactions [15]. Hydrophobic interactions - promoted by the removal of apolar groups from the aqueous environment- are presently believed to contribute the most to the binding free energy by increasing the entropy, but they do not account much for specificity. The latter is rather achieved by optimal van der Waals (steric) and electrostatic complementarity between the protein receptor site and the ligand. Thus an initial screening is primarily concerned with finding the ligand that combines the best shape-fitting with the right electrostatic driving forces to guide it to the active site and add extra selectivity in loco [ 14]. The structure of a co-crystallised enzyme-inhibitor complex would be an ideal starting point for one such study. The designer would have an accessible conformation of the receptor with the induced fit distortions introduced upon ligand binding already 'in place' [15]. Additionally, the inhibitor can be used as a scaffold for developing homologous ligands, the limiting aspect being that there might be a whole range of somewhat different compounds, with perhaps better affinities, that probably will never be explored. Anyway, an empty binding site is just enough for SBDD, and it is for the modeller to decide to what extent induced fit should be considered.
2.1. Receptor Homology-Built Models The subject of enzyme homology-modelling should be addressed first. While there might be no 3D structural data for the receptor site of interest, it is often possible to find structural information for a related protein or group of proteins in the same family, in which case model-building of the target using the known structure(s) of the homologous protein(s) as template(s) is a powerful reliable method [16, 17]. Considering that during evolution the 3D structures
542
among homologous proteins have been much better preserved than the primary structures, it has been shown that where sequence identity between the target and the template structure is high (over 70%), comparative molecular modelling is highly successful [ 17]. Database search programs like FASTA [18] or BLAST [19] have been optimised to detect evolutionary relationships between proteins, and are readily adequate for template recognition and (multiple) sequence alignment in cases where the sequence identity is over 25-30% [20]. The general procedure is to assume next that the backbone of the model is identical to the one of the template structure and add the side chains onto it [21], although some difficulties may arise with insertions, deletions and local low similarity. In some cases, the homology concept can be extended to water molecules buried within the protein matrix [22], and there are already homology-built proteases that include a set of such conserved waters as an important intrinsic feature of the structure [23, 24]. As far as the homology modelling process is concerned, conserved buried waters impose a new type of structural constraints that otherwise would be impossible to define during the structure refinement [23]. Automatic homology modelling is gradually becoming a routine technique, but the modeller is still frequently faced with multiple, seemingly similar choices during the multi-step modelling process[21], and much work is done using interactive molecular graphics packages like WHAT IF [25], QUANTA [26] or Insight [27]. Of crucial importance is the assessment of the model's quality. Errors will always occur, and once they have been identified, the difficulty lies on defining the criteria for judging on their relevance. In modelling a proteolytic enzyme, for instance, misplacing a surface loop may not be so important if its location falls at a 'safe' distance from the active site. Verification techniques like PROCHECK [28] that check for normality indices on experimentally resolved structures, can be applied in the same way on homology-built models. In cases where the target and the template share a sequence identity of more than 90%, the errors may be similar to (as small as) those found in the crystallographically determined structures [20], but it is more frequent for the model root mean square (RMS) deviations to be higher than the ones in the corresponding template(s). Errors like bumps between atoms, or an occasional abnormal bond length or angle, can be removed by allowing the model to relax during a few energy minimisation steps. Long energy minimisation runs, however, result in the introduction of many small errors [21] and are usually not very useful. Once the model is sufficiently refined to satisfy the desired quality criteria, it is just a matter of selecting between the many available computational SBDD tools for studying receptor(model)-ligand interactions. One interesting
543
example is the design of a new inhibitor (de novo design) for the aspartyl protease renin [29]. Renin carries out the first step of the renin-angiotensin cascade which is implicated in the increase of the systemic blood pressure, leading to hypertension. The crystal structure for the enzyme was not reported until 1989 [30] so Moon and Howe [29] used the homology-built model proposed by Carlson et al. [31] to create possible new inhibitors with their SBDD program GROW: upon synthesis of the best scoring peptide, it was found to inhibit the enzyme with a Ki of 30 gM.
2.2. Mapping the Binding Region In order to characterise the three-dimensional design constraints imposed by the receptor, a first class of methods attempts to locate target sites within the binding cavity that will yield the crucial favourable interactions with predefined probes. The probes can be individual atoms - site points - or small fragments e.g. functional groups like carbonyl or methyl, water, which search for 'hot spots' in the active site, as it is illustrated in figure 1. Examples of these methods are GRID [32, 33], MCSS [34], HSITE [35] and LUDI [36].
Figure 1. Schematic representation of the fragment placing methodology.
GRID and MCSS use classical mechanics potential functions for scoring and selecting the fragments, and are usually called energy based methods [37].
544
GRID constructs a 3D lattice of regularly spaced points within the active site and evaluates the interaction energy between the fragment and the receptor at each grid point. The design of thymidylate synthease inhibitors reported by Appelt and co-workers [38] is one example of GRID successful applications. In the more recently developed MCSS method [34], the CHARMM force field [39] was modified to allow for several thousand copies of the probe to be simultaneous and randomly placed in the active site without 'feeling' each other (no nonbonding interactions between the copies); upon energy minimisation and/or dynamics simulation, the copies tend to cluster in energetically favourable regions. Compared to GRID, MCSS has the advantages of revealing preferable orientations of the functional group and be able to sample the actual entire active site instead of a grid representation of it. Caflisch et al. [40] suggested a number of modifications for increasing the potency of MVT-101 based on the results of a multi-copy simultaneous search procedure. MVT-101 is a peptidic inhibitor of HIV-1 aspartyl protease - one form of the human immunodeficiency virus (HIV) -, a major pharmaceutical target in the treatment of AIDS. Unlike GRID and MCSS, HSITE and related programs are rule-based (faster) methods, which derive their rules for acceptance from extensive analysis of experimental data. HSITE rules are based on the structural information stored in the Protein Data Bank (PDB), and the program maps the protein active region for hydrogen bonding sites (the H in the program's name). Related programs, developed for the sole purpose of water molecules placement [41-43] are able to rapidly identify those water residues which are preserved in the active site of homologous proteins. Instead of the direct fragment location methodology, LUDI uses a site point connection approach [37] to guide fragment placing: it determines the preferential locations of individual atoms - interaction sites - within the binding region, and then searches for fragments that might bare the appropriate spatial arrangement of atoms to occupy one or more of those locations. LUDI is rulebased, the rules being derived from the nonbonding contact geometries occurring in reliable protein-ligand structures. The use of the LUDI program is well illustrated by the work of Pisabarro et al. [44] in the improvement of an already existing inhibitor for the human synovidal fluid phospholipase A2 (HSFPLA2), an enzyme implicated in inflammatory occurrences. An extra hydrophobic pocket in the enzyme active site was revealed by an initial mapping made with GRID, and LUD! was then used to search for possible substituents, directed to fill that pocket. It resulted in a modified inhibitor that was found to be nearly one order of magnitude more potent than the original one. Whether the fragment placement is direct or via site point connection, the methods have the obvious advantage of rapidly locating the hot spots in the
545
active site, and suggest the optimal prototype fragments to be placed there. One major limitation is that they do not readily produce a 'complete' ligand. There are usually gaps between the well-placed fragments that need to be filled with reasonable bridging groups to connect them, and a number of programs have been developed for that purpose and will be discussed in the next subsection. It should be pointed out that the site point connection approach can be extended to search for those bridging groups, and the LUDI [36] program may be used in that way, usually linking two close fragments at a time.
2.3. Assembling the Ligand A second class of methods uses the information about favourable fragment locations obtained with any fragment placement program, and connects them with 'linkers'. In general, this is a single step process and only one linking structure is used to join the key fragments. Since it is assumed that the active site hot spots are already occupied with the functional groups necessary for tight binding, it is not expected that the linker will contribute significantly to the binding energy. Accordingly, the linker just needs to have the correct geometry to avoid undesirable bumps with the receptor protein while it bridges the isolated fragments, the result being a complete ligand. Some of the better known programs for linking fragments are CAVEAT [45] and HOOK [46]. Linkers can be searched for on a variety of databases, whether they are libraries of known compounds like the Cambridge Structural Database (CSD), or the new synthetic databases TRIAD (with over 400 000 chemically reasonable tricyclic rings) and ILIAD (around 100 000 acyclic linkers) [37] for use with the more recent versions of CAVEAT. Ring systems usually represent a more attractive linker solution, for their conformational rigidity implies smaller entropy losses on binding when compared to the more flexible acyclic structures. Figure 2 is a schematic example of how a general fragment linkage method works. In CAVEAT, the 'loose' bonds in the isolated already placed fragments and those a candidate 'scaffold' might form are treated as vectors, with bond atoms defined as base and tip. The scaffolds are database-stored and their tip atoms are suitable for replacement. The program compares the relationships between the vectors in the fragments to the ones in the trial linker, and selects the best matches. CAVEAT treats any number of fragments simultaneously, merges them to each matching scaffold, and then the assembled ligand structures undergo further testing to filter out those that establish linker-receptor bad contacts. HOOK uses molecular 'skeletons', each with two or more connection points (specific bonds called hooks), and requires a flee methyl group on each isolated fragment to be linked. If such a group exists, the fragment is connected
546
directly to the skeleton via bond fusion, otherwise an extra methyl can be used as a spacer to link the functional group to the hook.
Figure 2. Schematic representation of fragment linkage methods. In A a tricyclic ring linker connects three previously placed key fragments avoiding bumping the receptor. In B a bump occurs (signalled by the arrow).
547
The skeletons can be derived from know libraries like the CSD or created with a program that generates simple carbon flames, and a simplified form for the van der Waals contacts is used to estimate the 'overlap score' between the protein and the putative ligand(s) [47]. MCSS minima is a typical input information for HOOK. The MCSS-HOOK methodology was applied in a survey of the human thrombin binding site [47]. Thrombin, a serine protease in the blood coagulation cascade, is an important enzyme for cardiovascular research. The best candidate inhibitor proposed was a tryciclic ring connecting five MCSS minima from the eight probing fragments initially used; the new structure was able to form most of the non covalent interactions observed between the enzyme and the known inhibitor PPACK (D-Phe-Pro-Arg-Chloromethylketone). Within the fragment linkage methodology, some programs exist that use building blocks (atoms, chains, ring systems) to assemble the linker skeletons that best fit the pre-located functional groups, two examples being PRO_LIGAND [48] and NEWLAND [49]. The methods described so far allow for the very fast generation of thousands of possible ways to connect key fragments. There is always the issue of linker flexibility - structures with several accessible conformers - which might be addressed by using multiconformational databases or refine the procedures to account for it, the cost being longer searching times. Synthetic accessibility is another issue to consider, for many of the ideas generated are in practice not feasible for chemical synthesis. Another class of methods for assembling putative ligands uses build-up procedures to construct the ligand sequentially piece by piece, requiring that each added building block contributes in some significant way to receptor binding. Building blocks can range in both diversity and size, from 'simple' atoms (small functional groups) to aromatic rings, even amino acids. The procedure is generally initiated by placing a first building block in a matching region of the active site and then make the ligand grow from there, either linearly by adding each new piece to the previously accepted one, or in a branching mode, i.e., the next block may be connected anywhere in the already assembled fragment. Programs employing this methodology include GrowMol [50] which uses an atom-by-atom build-up, and GROW [26] with a fragmentby-fragment approach instead. GrowMol uses a random algorithm for selecting and orienting each functional group to be placed. The scoring is carried out in a rule-based manner according to its complementarity to the active site; rules for accepting the newly grew ligand account for the strain internal energy of its bound conformation and experimental data available for the particular receptor. Unlike GrowMol, which is capable of generating an enormous diversity of organic compounds, many with synthetic accessibility problems, GROW was conceived to search for
548
peptidal ligands for a given enzyme, its building blocks being essentially amino acid residues. Each trial residue is allowed a number of pre-stored conformations and the growing peptides are scored with the AMBER force field [51] plus some solvation terms [26]. Only those conformations that originate the most energetically favourable results are retained for the next growing step. Since, sometimes a low scoring conformation in the middle of the ligand might result in a best overall binding solution- which will be missed if the sequential build-up approach is taken alone - GROW also provides an algorithm for probabilistic exploration of the conformational space, intended to spot those 'special' cases. A validated application of GROW was referred earlier in this chapter in the homology modelling subsection. A striking advantage of programs like GrowMol and GROW is the possibility of conformational scanning as each piece is connected. Problems may arise when the growing ligand must cross zones in the receptor active region where no significant binding contribution is expected (dead zones [37], where scoring fails). Finally, a whole new class of methods exist that assemble ligands in a random fashion, usually beginning with a 'soup' of fundamental building blocks placed in the active site and allowing for the bonds between them to be formed and/or broken at any time. A brief interesting review of these special methods can be found in reference 37. The work of Gehlhaar et. al. [52] illustrates a structure-based modification of a lead compound using one such method, the MCDNLG (Monte Carlo d e n o v o ligand generator) program, to improve an HIV-1 protease inhibitor. This work also points out that, although the ligand design techniques described so far are intrinsic 'de novo' generators, they can prove very useful in modifying already existing ligands, i.e., creating new, hopefully more potent and/or specific ligands, using 'old' ones as seeds and adding or replacing the necessary fragments to them.
2.4. Docking the Ligand Instead of attempting to assemble a possible ligand structure inside a receptor's active site, the modeller might decide to initiate the enzyme-ligand study by searching for new ligands among the diversity of known compounds that are readily synthesised or available for purchase. It may also be important to check if a natural inhibitor of a certain enzyme is capable or not of inhibiting a related one and why, or even to assess the selectivity of two or more different receptors towards a particular ligand. Whichever the case, a thorough knowledge of the ligand(s) representative binding modes towards the receptor(s) of interest is necessary, the effective exploration of those binding modes being usually achieved with docking techniques. There are many approaches to molecular docking and the subject has recently been reviewed [53, 54]. Automated docking can be used to explore the
549
many plausible binding orientations and interactions of the putative ligand(s) at a well-defined active region, or to probe an entire receptor's structure in search for potential binding pockets. Such thorough searches are carried out via shapefitting techniques and random methods such as Monte Carlo simulated annealing or genetic algorithms. The interaction energies of all scanned receptor-ligand conformations/orientations are estimated, with the purpose of finding the low energy bound conformers(s). Once again, fast scoring schemes are needed, since the number of possibilities explored by docking techniques make rigorous energy simulations as time forbidden as they are for the previously mentioned d e n o v o ligand assembling methodologies. DOCK [55-57] is probably one of the first, best known and widely used SBDD programs, originally designed to do rigid-body fitting of user-supplied compounds in a receptor active site. A set of different sized overlapping spheres is matched against the Connolly molecular surface of the binding cavity, thus defining the volume to be filled. The spheres centres are taken as potential .ligand atom positions, and distances between those centres are compared to distances between the atoms of each trial compound, in search for a certain combination of matches, allowing some distance tolerance value. Each time a hit is found, the candidate ligand is rotated and translated to least-squares fit the centres. The recent distributed 4.0 version of DOCK also includes routines for handling ligand flexibility, the flexible molecule being considered a collection of rigid segments separated by routable bonds. The conformational search is carried out either by assembling and minimising the molecule one segment at a time starting from an 'anchor' pre-docked s e g m e n t - in which case the process is repeated for each docked orientation of the anchor -, or by generating and minimising the entire conformer in one step, and then dock (orient) each conformation independently. The scoring accounts for steric (shape-fitting) and electrostatic complementarity by means of a grid-based algorithm. Chemical scoring is also allowed, which consist in labelling some user-selected centres with certain desirable properties (e.g., hydrogen donor/acceptor). DOCK has been used to screen ligands for a number of systems, a striking example being the search for novel inhibitors of thymidylate synthase (TS) [58]: the enzyme plays an essential role in DNA synthesis and is an important target for certain chemotherapeutic drugs. A 3D version of the Fine Chemicals Database (FCD) was the source of the compounds to dock, and a number of hit structures, significantly dissimilar from the TS natural substrate and not known for the enzyme inhibition, revealed high activity when experimentally tested. Furthermore, when one of the new leads was found to co-crystallise in an unpredicted TS binding region, DOCK was used again to probe that particular region, and the result was another group of new leads.
550
Monte Carlo (MC) and simulated annealing (SA) approaches, together with some MC/SA hybrids, have been known to be efficient alternatives to conventional MD for sampling the accessible conformational space, and are also applicable to flexible ligand fitting or multiple ligand orientation search [59, 60]. Comparatively, such docking approaches are much slower than a strict shapefitting rigid-body one, and are used to search the many flexible binding possibilities of a ligand, rather than to scan extensive databases of compounds in search for new leads. This is the case of AUTODOCK [61 ], a program that uses the hybrid MC/SA technique in which the system is allowed to anneal (cool slowly) from a starting high temperature, and a Metropolis sampling is used. The ligand torsional angles plus its relative orientation towards the target are varied, and a grid-based energy representation is used for rapid evaluation of the affinity. Genetic algorithm (GA) methods have been recently introduced in chemistry, and their reported docking successes in reproducing experimentally known bound conformations of a variety of ligands [62] are encouraging. The underlying idea of GA-based docking is that the possible ligand conformations are 'chromosome-like' encoded. Thus a group of random torsional angles - the chromosomes - is allowed to 'evolve' and 'breed' a first generation of solutions. The best scoring molecules then have better chances to be selected as 'parents' of the next generation, and breeding continues by 'swapping' or 'crossing over' segments on either side of a randomly selected torsional angle. A superior set of solutions should be attained by repeating the process a reasonable number of times (generations). Flexible GA-based docking runs are time consuming, and therefore not suitable for large databases screening. Eventually, docking methodologies might also be used for further refining a set of ligand solutions, whether they have been built by ligand assembling programs or produced by a first docking run. Possibilities include redocking the hits with increased sampling of the orientations and/or conformations, and rescoring with alternative scoring functions.
2.5. Scoring the Ligand From what has been written so far, it is clear that d e n o v o design and docking programs quickly explore an enormous amount of chemical and conformational space - typically thousands of structures - and need fast filter procedures to separate out poorer quality test solutions from the rest. In terms of computer time it is impossible to calculate binding affinities from first principles when large-scale filtering is at stake; instead, simple schemes to score the likely binding tightness of each trial structure to the active site have to be employed instead. These empirical scoring schemes (scoring functions) derive their constraints and weighting factors mainly from structural parameters of ligand-
551
receptor complexes. In what concerns estimating binding affinity, scoring functions for de n o v o design attempt to overcome the limitations and approximations of the traditional QSAR approaches, in particular the low level of confidence their predictions offer for novel ligands which bear striking dissimilarities from those used to build the QSAR equation [2]. Rather than true estimates of the binding affinity, some scoring functions rank the ligand using indicators for a number of properties considered relevant to drug design in general. Along with shape and electrostatic complementarity, ligand physiochemical constraints like molecular weight, surface area, partition coefficients (log P), even synthetic accessibility, might be included in the function. It should be noted that not all the evaluating parameters (e.g. molecular weight or synthetic accessibility) are related to binding affinity. A most elaborated scoring function of this type is used by Glen and Payne [63] in their GA-based method for ligand genesis. Constraints are divided into three classes scalar, surface and grid-, the function being:
i=class j = l , ni
where ni is the number of constraints per class, W is the weight for a particular constraint, and E its error from ideality. The above mentioned physiochemical constraints belong to the scalar class. A surface constraint is, for example, the electrostatic potential surface: the ligand and receptor surface points 'buried' between the two are compared in search for complementarity. Shape and size of the cavity of the receptor, together with hydrogen bonding sites, are griddefined, and therefore included in the remaining class. This scoring approach is used with simpler, fewer constraints in a number of other programs, like PRO_LIGAND [48] and HOOK [46], the latter being primarily concerned with shape-fitting - mainly punishing overlaps and bad contacts - since it assembles ligands from optimal previously ranked fragments already in place. A general scheme to quickly estimate the binding affinity is to divide the free energy of association into three major components" electrostatic, hydrophobic and entropic. Some scoring functions account for an explicit hydrogen bonding component. The reader is referred to a review by Ajay and Murcko [64] for a detailed discussion of these scoring strategies and its use in drug design. In general terms, the electrostatic component is usually calculated as a simple Coulombic interaction, although some authors use a continuum solvent model (Poisson-Boltzmann) instead [65]. The hydrophobic component may be computed with parameters like log P and the molecular or solventaccessible surface area. This considers that hydrophobicity is estimated by the energy needed to create a cavity in the aqueous solvent in which the ligand is
552
embedded. Upon receptor binding, such energy is released for a lipophilic ligand, entropically favouring the process. Finally, the conformational entropic component is determined by the number of routable bonds constrained to rigid conformations on complex formation. The appropriate scaling constants (weights) are derived from a calibration set of receptor-ligand complex structures and binding data. The most widely known scoring function for predicting the binding free energy is LUDI [36, 66], with the following equation:
AGbinding AGo + AGhb ~. f (AR,Aa) =
h-bonds
-
AGionic ~, f (AR,Ao~) ionc-int
.
(2)
- AGlo, oAlipo + AGrotNrot where AGo reflects the reduction in rotational and translational entropy, and the remaining AG terms are the contributions due to hydrogen bonding (hb), ionic interactions (ionic), lipophilic interaction (lipo) and energy loss due to freezing ligand internal degrees of freedom (rot). The hydrogen bond and the ionic term are summed over the number of occurring interactions (H-bonds and ionic pairs, respectively) between the ligand and the receptor, and are weighted by a penalty function for assessing deviations from ideality (distance, R, and angle, cz); Alipo is a grid-based quick approximation of the buried ligand-receptor surface area and N,.ot is the number of flexible torsions in the ligand. The AG weights were calibrated from analysis of a dataset of 45 protein-ligand PDB structures (for the actual values see reference 66). As we have seen in previous sections, some programs use force fieldbased scoring functions, typical examples being MCSS [34] and GROW [26]. They are usually adaptations of well known classical mechanics force fields (AMBER [51], CHARMM [39]) calculating pairwise intermolecular energies to assess the ligands quality: severe distance cutoffs for the intra-receptor van der Waals and electrostatic interactions allow for faster computational times. A major problem to this approach is that, if taken alone, a strong bias towards more hydrophilic ligands will occur, since polar atoms will always score better than the nonpolar even when the ligand is docked in a hydrophobic binding site [31]. Accordingly, 'corrections' are introduced to account for (de)solvation effects (i.e., hydrophobicity), and may take the form of an accessible surface area term (the GROW option), a reward term for burying nonpolar atoms [52], or a dependence on known solvation energies of the structures at stake (the MCSS option). The energy scoring component of DOCK [55-57] is also based on the implementation of force field scoring. The program offers the possibility of precomputing the van der Waals and electrostatic potentials for the receptor and
553
store them in a grid for the docking site, thus speeding up the calculation and scoring of the interaction energies with the ligand conformers. The complete DOCK scoring scheme includes two more separated grids for contact and chemical description, the three scoring grids being applied independently. An alternative form of force field-based scoring is employed in the latter versions of GRID [33], the basis being statistical derived potentials - energy profiles for pairwise interaction as a function of distance - developed from analysis of crystallographic geometric parameters. It is worth mentioning that there are some aspects in receptor-ligand (enzyme-inhibitor) binding that scoring functions typically neglect. The more obvious one is the flexibility of the receptor. With few exceptions, ligand screening applications consider the receptor as rigid and with a single bioactive conformation, thus ignoring multi-conformational active site states, like in the lactate dehydrogenase case [67]. Even when the protein conformation is taken from a co-crystallised complex, different ligands are bound to produce somewhat different induced fitting accessible conformations of the target. Add this to the fact that a good number of de n o v o design and docking programs perform a limited sampling, or none at all, of the ligand conformations, and it becomes clear that in many cases several good hits will be missed. Another aspect is that typical scoring weights imply that contributions to binding affinity are additive and that each occurrence of a considered basic contribution is equivalent. As a result, the same weight is assigned to all hydrogen bonds; translational/rotational entropic loss beares the same penalty whichever the trial ligand; and all frozen routable bonds score equally. This may old tree on average, but clearly no room is left for unpredictable and not so rare co-operative (synergetic) effects [68]. Finally, temperature, pH, ionic strength, etc., are not considered and yet it is not unusual for ligand affinity to be dependent on experimental conditions. No doubt, fast ligand prioritisation would benefit from some refinement on those and other aspects (synthetic accessibility and bioavailability, just to mention two more). There are only a few reports in the literature of real ideas resulting from fast ligand generating/docking programs that have actually been subject to synthesis and assaying. Synthetic chemists are still hesitant in attempting to validate such computer suggested 'raw ideas', and some kind of further modelling on the best scoring protein-ligand complex(es) might be desirable for a refined assessment prior to synthesis.
2.6. Refining the Enzyme-Ligand Structure Given a suitable potential energy function [69], the straightforward method to relax the enzyme-ligand structure, and to reach a local minimum in the complex potential energy surface, would be a simple MM energy
554
minimisation. The use of a classical mechanics force field [39, 51, 70] is required because the size of protein-like systems does not allow for a quantum approach on an overall energy optimisation, even when no explicit representation of the solvent is considered. A few energy minimisation steps are usually enough to correct small structural errors, like an occasional bump or internal parameter strain. The procedure is not dynamic - the calculation is performed at 0 K - and is rather limited in scope, but it has been proved useful as a first qualitative tool to rationalise and understand relative binding trends [71]. Qualitative insight into the stability of the system under dynamic conditions can be obtained from a reasonable sampling of the configurational space to be analysed for a variety of structural and dynamic properties of the protein-ligand complex, and conventional MD simulations are commonly used for that purpose. MD will be discussed more thoroughly in the next section, and it is only its application as a refinement tool of the enzyme-ligand structure that is presented here. In conjunction with MD, it is also possible to perform a simulated annealing (SA) in which the system is 'overheated' and then cooled periodically to let it relax into local minimum energy conformers. Traditionally, SA is a tool to refine X-ray and NMR structures and for that purpose the potential function includes experimental constraints (X-ray structure factors or NMR NOE distances) [72]. For the purpose of verifying a modelled enzyme-inhibitor complex, a conventional protocol is to perform a few hundred picoseconds of MD only on the bound structure, typically in aqueous solution and at physiological temperature. In general, a full solvent representation is not allowed in terms of computational time, but a 'blob' of explicit water molecules around the complex, or alternatively just surrounding the binding site region, should be sufficient to account for some solvent effects. Any crystallographic water molecule thought to be relevant (e.g., conserved buried waters) should also be included. The option between letting the totality of the system relax, or freezing parts of it and allowing only the ligand plus some residues in the receptor active site to be flexible, is one to be taken judiciously. This is because complete mobility is computationally expensive and does not necessarily mean a significant improvement in the results, but applying constraints involves some arbitrary selection of the residues - with no guarantee that the appropriated choice has been made. From the generated MD trajectory one can obtain average motions of atoms over time, average interaction energy values for the bound structure, time correlations for atomic positions and velocities, and RMS geometric fluctuations to be compared with any available experimental data. A particular intermolecular distance, or the interaction energy between two functional groups
555
known to be critical for the binding affinity, could also be scanned over the trajectory. Checking if the average structural features comply with the originally proposed model contributes to the process of validating the predicted binding mode(s) also. The average value of the protein-ligand interaction energy can be indicative of the complex stability but is not directly comparable to the binding free energy. No extensive sampling is actually feasible and entropic effects are basically omitted, thus caution is advisable when comparing the intermolecular energies of two different designs (e.g., two inhibitors for the same enzyme) specially if the ligands bare significant differences in chemistry or size. Despite the limitations, the simple MD protocol described above can provide valuable information about the energetic and dynamic complementarity of the ligand to its receptor. An illustrative example is the reported MD 'computational cocrystallisation' of the HIV enzyme with its MDL 73,669 potent inhibitor [73].
3. THE ENZYME-LIGAND COMPLEX IN MOTION
Having found a possible bound conformation for the receptor-ligand system of interest either from the experimental structure of a co-crystallised complex, a model proposed by a d e n o v o design or docking program, or even one attained by 'manual' design - it is usually useful to examine the dynamics of motion the complex may undergo. The goals may range from assessing the stability of the binding mode(s) to the more accurate calculations of the binding affinity and predictions of catalysis mechanisms, with all these aspects involving enzyme-ligand interactions. The relevant approaches available for such studies will be discussed now. 3.1. Monte Carlo and Molecular Dynamics Simulations Proteins are intrinsically dynamic systems and several evidences of this behaviour can be obtained from structure determination techniques. For example, ~ factors, determined by X-ray crystallography or neutron scattering, reflect average atomic positions [74, 75], the internal motions of proteins can be detected by NMR [76, 77] and it is not unusual for different X-ray structures of the same protein to show conformational variations [78, 79]. The occurrence of large conformational fluctuations in proteins have been also observed by the application of several other experimental methods such as hydrogen exchange [80], fluorescence quenching [81] and laser flash photolysis [82]. The molecular motions in proteins cover a large range of time scales, from the very fast C-H vibration stretching with relaxation times of about 10~4s to the very complex and slow protein folding processes which range from 10~s to hours [83, 84] and undertake an important role in their biological activity [85, 86].
556
Computer simulation has become a very useful and flexible tool for understanding these motions at molecular level, particularly because suitable detailed experimental techniques are, in general, not available. MD and MC with associated classical potential energy functions are the most frequently used methods for the simulation of proteins. MD is a deterministic approach that calculates the system trajectory (conformation as a function of time) by numerical integration of the equations of motion. By contrast, MC adopts a stochastic methodology, whereby the atoms are randomly moved during the course of the simulation and rules of acceptance for the generated conformations are usually defined by the Metropolis algorithm. Most potential energy functions used in molecular simulation studies are based on an empirical representation of Born-Oppenheimer surfaces [87]. The ground state of a molecular system is described by a continuous potential energy, function of the coordinates of its atoms assumed as point charges, with the following general form: V= Vl, o,,j. + V,~,bo,,j.
(3)
Vho,,a. is the bonded component associated with the relevant internal coordinates (bond lengths, valence, dihedral and improper angles, etc.) and V, ebo~d, is the non-bonded component which is commonly described by two-body additive terms for the interactions between pairs of atoms separated by three or more bonds (electrostatic, van der Waals and, in some cases, hydrogen bond interactions). Such additive potentials have simple analytical forms [39, 51, 88] which require moderate computational resources for their implementation in molecular simulations (MD or MC) of biological systems. Despite their simplicity, it has been shown that these type of potentials can accurately reproduce many experimental properties for various molecular systems [89]. However, they often fail to reproduce ionic association and other processes where the atoms are subjected to particularly large variations of the electric field [89-95]. This problem may be overcome by the inclusion of non-additive effects, for which two main methods can be used: the introduction of additional many-body polarisation effects within an atom-centred model [89-92], and the implementation of a more rigorous description of the electrostatic interactions through a distributed multipole analysis [93-95]. Another problem arises from the difficulty of a monopole model to simulate the conformational variations of atomic charges [96-98] and some methodologies have been introduced to derive conformationally transferable charges [97, 98]. Unfortunately, such improvements have not been easy to implement in molecular simulations of proteic systems because they have too large a number of atoms. For proteins and other big molecular systems, with a large number of internal coordinates described by harmonic components, MC is generally less
557
efficient than MD for sampling the conformational space [99]. Nevertheless, MC has the advantage of allowing a much easier manipulation of the degrees of freedom and it works well for side chain sampling [100]. Moreover, its use in biomolecular simulations has been encouraged by the development in the last few years of specific 'smart' algorithms with improved efficiency [ 101, 102]. MD simulations are performed naturally in the microcanonical (NVE) ensemble where the number of atoms, N, volume, V, and energy, E, are constant and the characteristic thermodynamic function is entropy. However, the results of molecular simulations are to be compared with values obtained in experiments carried out under conditions closer to the canonical (NVT) or, more frequently, to the isobaric-isothermal (NPT) ensembles. Furthermore, the characteristic thermodynamic functions of these ensembles, Helmholtz and Gibbs free energies respectively, are much easier to compute than entropy [ 103]. While the reproduction of the NVT and NPT ensembles is straightforward in an MC simulation, additional algorithms to control pressure and/or temperature need to be introduced in the MD case. The more widely used are Berendsen's method [104] with a weak coupling to an external bath obtained through modified Langevin's equations, and the 'extended system' formalisms [105107] which involve the introduction of additional dynamic variables associated with external baths. Solvation undertakes a crucial role in structure and dynamics of proteins. Molecular simulations of moderate sized biomolecular systems in a box of water with periodic boundary conditions are becoming computationally feasible [108116], but this solvation approach is still prohibitive for many proteic systems. Alternatively, active region methods have been extensively used to enable the study of more complex systems [109, 110, 117-129]. Within this formalism, the model system is subdivided into an active region centred in the zone of greater interest to be simulated by full MD or MC, a boundary region where the atoms are constrained by stochastic [109 ,118-122, 125, 128] and/or harmonic forces [109, 110, 117-128], and an inert region where the residues are fixed to their starting positions [109, 117, 122, 123, 126-128] or even deleted [110, 118-121, 124, 125]. The solvent is usually 'reduced' to a water shell embedding both the active and boundary regions, and care must be taken to ensure that the simulation is as realistic as possible. Despite the large approximations involved, active region molecular simulations have often produced results with quality comparable to more sophisticated methodologies [130]. Under the most favourable conditions, molecular simulations could be used to calculate the binding free energy (AF) of a ligand (L) to an enzyme (E). It would be necessary to simulate the corresponding association reaction, E (aq) + L (aq)
AE ~ E-L(aq),
(4)
558
from the initial state, in which both molecules are isolated and do not interact with each other, to the final state where the enzyme-ligand encounter and binding occur, one such simulation being computationally prohibitive for most biological systems. In fact, the steps involved in the association process, including diffusional encounter in the solvent environment, conformational rearrangements in both fragments, desolvation and binding, occur in time scales which are usually infeasible for the computational resources presently available [3]. In practice, most simulations are carried out solely on the enzyme-ligand complex to clarify the nature of the binding process. The ability of conventional molecular simulations to explore the enzymeligand conformational space has provided a very perceptive tool for improving the knowledge of the structural requirements in ligand binding, with many examples reported in the literature [131-148]. For instance, enzymes are known to adopt several conformations of which only a few allow ligand docking; some exist that have buried active sites located in channels or clefts with fluctuating widths as part of their internal motions. The entry of a ligand in such binding sites depends on these structural dynamical fluctuations [131 ], and it may result in a time-dependent reactivity upon contact. The binding process will be successful if the ligand encounters the enzyme with the active site accessible, and the probability of a favourable encounter increases if the 'gate' is rapidly switching between opened and closed. Acetylcholinesterase, for example, has an active site located at a narrow and deep gorge, and its entrance is completely blocked by five aromatic side chains. MD simulations performed by Wlodek et al. [132] revealed fluctuations in this enzyme active site channel that are large enough to allow the entry of a substrate, and the frequent formation of alternative entrances has been detected also. Additionally, Zhou et al. [133] have shown, for an acetylcholine substrate probe model complexed with the same enzyme, that the 'gate' is open only 2.4% of the simulation time, being partly compensated by the rapid gating dynamics. Another interesting example is cytochrome P450, with the entrance of its buried active site blocked by three aromatic side chains. The possible substrate access channels have been identified by Ludemann et al. [134], combining a thermal motion pathway analysis with MD simulations. Moreover, it has been suggested that the high activity of the piperonylbutoxide inhibitors is related to the ability of their long side chains to block the active site entrance of this enzyme [135]. The con~brmational flexibility of the binding region might also undertake an important role on enzyme specificity also. Fluctuations on the enzyme structure include variation in the position of salt bridges, hydrogen bonding and other attractive groups which are associated with ligand selectivity. Both MD [149-152] and MC [153] simulations have been carried out to analyse the
559
importance of these structural fluctuations on the capacity presented by some enzymes to discriminate between different ligands. Of particular interest is the study of the biological mechanisms associated with enzyme stereoselectivity and enantioselectivity. For example, MD simulations have been successful in explaining the different affinities of trypsin and acetylcholinesterase to the diastereomers of soman inhibitors [154] and the ability of subtilisin Carlsberg and cz-chymotrypsin to discriminate between Rand S- configurations of chiral aldehyde inhibitors [155, 156]. Other applications of MD and MC simulations include their use in identifying the more probable stabilising interactions in the enzyme-ligand complexes [157-160], stressing their importance as guiding tools in the design of new specific inhibitors [161-163] and molecular modelling of inhibition processes [ 164-171 ]. Finally, it should be pointed out that the trajectory files produced by MD simulations are extensive and create problems of storage space. The classical lossless compression algorithms give poor efficiencies in the compression of this type of files. Therefore, specific lossy algorithms, which increase significantly the compression efficiency preserving a high degree of precision, are of great importance to attain a better approach to this problem [ 172-174]. 3.2. Continuum Electrostatic Methods and Brownian Dynamics For most enzyme-ligand systems, conventional molecular simulations can be used only to study processes dealing with relatively short periods of time (from ps to few ns), thus making them unfitted to treat the many enzyme-ligand association events thought to occur on much longer time scales. For the purpose of simulating such enzymatic processes some approximated techniques can be used, providing that atomic detail is not essential to describe all the components of the system. Despite the fact that the dynamic conformational fluctuations are usually of fundamental importance to a correct understanding of enzyme-ligand association, static binding models may be sufficiently accurate to evaluate relative affinities in cases like the study of topologically similar ligands for a common target enzyme [175, 176]. For such particular situations, the problem might be essentially 'reduced' to the evaluation of the electrostatic interactions which allow the enzyme to discriminate between different ligands [177, 178]. These interactions are frequently evaluated as simple Coulombic energies, but may be more accurately calculated with hybrid discrete-continuum models taking into account both pair wise and desolvation effects [ 100]. In this context, continuum electrostatic methods, such as the protein-dipole Langevin-dipole [179, 180] and the Poisson-Boltzman [181, 182] models, have been used as
560
efficient alternatives to the MD and MC formalisms discussed in the previous subsection. Within the continuum electrostatic formalism, tight ligand binding to an enzyme results mainly from an optimal balance between the unfavourable electrostatic penalty associated with the desolvation and the specific favourable interactions established in the enzyme-ligand complex [65, 177, 178, 183]. The idea is consistent with results suggesting that the formation of salt bridges and hydrogen bonds present a more favourable electrostatic balance in biomolecular recognition events than in processes like protein folding [177, 178, 184-186]. The formalism has been used to rationalise the relative affinities of different inhibitors for thermolysin [ 187], of hirudin for various thrombin mutants [188], and of substituted sulfonamides for carbonic anhydrase [189]. However, for estimating absolute binding free energies, it is necessary to introduce additional components to take into account the hydrophobic effects associated with the reduction of solvent-accessible area upon the enzyme-ligand association [190192]. The continuum electrostatic approach has been considered also appropriate for studying biomolecular systems when electronic polarisation effects, typically neglected by the additive pair-wise potentials commonly use in MD and MC, are dominant in relation to the conformational flexibility, and when changes in protonation states of tritable sites [193-198] or electron-proton coupling phenomena [199-204] occur. For certain enzymes, the nondiffusive steps have been optimised to such an extent that the rates of the corresponding ligand(s) association process will approach the limit of diffusion control [205, 206]. An evidence of this type of behaviour can be obtained from their high catalytic rate, constant clearly dependent on the viscosity and ionic strength of the solvent [206, 207]. Brownian dynamics, which is the diffusional analogue of MD carried out by numerical integration of Langevin's equations of motion, has been used as a convenient approach to simulate such optimised associations [205-221]. According to this methodology, the most important components of the molecular system are treated explicitly at atomic level, while the remainder components are represented with continuum models. The latter usually comprise the solvent and other groups whose detailed motions are not essential to study diffusive processes. The degrees of freedom eliminated in this way are included in a thermal bath which interacts with the discrete part of the system by both frictional and stochastic forces. The Poisson-Boltzman continuum electrostatic model is the most widely used method to describe interactions within the hybrid discrete-continuum system in Brownian dynamics simulations [182]. Once again, the large simplification created by the reduction of atomic detail enables the study of processes which occur in much longer time scales.
561
Brownian dynamics has been used to study the kinetics of a variety of enzyme-ligand diffusional encounters and to clarify the very important role electrostatic interactions undertake in such processes. For example, the rate constant for the binding of N-methylacridinium - a positively charged inhibitor to acetylcholinesterase has been shown to significantly rise with the increase in absolute value of the negative charge on the enzyme [208]; the electrostatic steering of the ligand has been observed to also contribute to the high rate constant obtained. Within the referred methodology, studies performed on both the wild-type and the Glu 199---~Gln mutant of the same enzyme [209], revealed that the significant reduction of the mutant's catalytic efficiency towards the positively charged substrate acetylcholine seemed to be partially associated with the negatively charged residue-199 located near the biding region: it has been argued that the electrostatic potential induced by this residue, favours the diffusion of the substrate through the deep gorge and into the active site. Also worth mentioning are the results reported by Wade et al. [207, 210] for a series of diffusion-controlled binding processes, which conclude on the occurrence of a common mechanism for steering charged ligands via a conserved electrostatic potential at the enzymes' active sites. Moreover, the presence of salt links appears to influence even the binding of non-polar ligands. Other studies using Brownian dynamics have shown that some neutral enzymes display an appropriate non-uniform charge distribution which properly steers the ligand into the active site [211]. As for the role of effects like hydrodynamics [205, 212] and conformational fluctuations [213, 220], they have been observed to have only a moderate influence in the rate constants of diffusion-controlled enzyme-ligand association processes.
3.3. Rigorous Free Energy Simulations From statistical mechanics, it follows that free energy is an intensive thermodynamic property, which is dependent on the extent of the accessible phase space and governs the spontaneousness of the chemical and biochemical reactions. As we previously mentioned, rigorous calculations of binding free energies (AF) are computationally infeasible for most enzyme-ligand complexes. However, it is possible to determine relative binding free energies A A F - of mutated (ME) and wild-type (E) enzymes to a common ligand (L), or mutated (ML) and original (L) ligands to a common target enzyme (E), using free energy simulation techniques applied to the appropriate generic thermodynamic cycles
562
E(aq) + L(aq)-
AF~
~ E-L(aq)
l
E(aq) + L(aq)
AF~
~.E-L(aq)
or F31
ME(aq) + L(aq) -
~-ME-L(aq)
E(aq) + ML(aq)
AF~
.~E-ML(aq) AF~
From any of the cycles in equation (5), it follows that, AAF = AF2- AFt
= AF 4 -AF3,
(6)
and the desired quantity can be obtained by the simulation of the non-physical processes 3 and 4, if the original and the mutated species are not too different. This standard thermodynamic cycle methodology, and some similar formalisms, have been extensively used in the prediction and interpretation of several enzyme-ligand binding processes [117, 127, 222-242]. Applications include the design and/or evaluation of new synthetic inhibitors for thermolysin [223], carbonic anhydrase [117, 224], dihydrofolate reductase [225-228], HIV protease [127, 230-232] and thrombin [234], studies on the selectivity of dihydrofolate reductase [229] towards different natural-occurring substrates, and the assessment of the relative importance of specific amino acid interactions on the stability of trypsin-inhibitor complexes [237, 238]. In the latter studies, Melo and Ramos [237] and Ramos et al. [238] have performed appropriate amino acid mutations to destroy the specificity of a given interaction, preserving the threedimensional structure of the original species. Consequently, the original and mutated residues had to be similar from a topological point of view but with opposite chemical properties. It has been concluded that the relative binding free energy values (AAF) depend on the associated interactions, and on the respective occurring environments. In the association process, a large number of water molecules are removed from the enzyme and ligand species, to form the enzyme-ligand complex. The overall stabilisation effect of a given (A,B) pair depends on the balance between the hydrogen bonds established by the residues A and B, with the solvent in the initial state (E(aq)+L(aq)) and the specific interaction A-B in the E-L(aq) complex. The environment in which this interaction occurs also plays an important role: in a hydrophilic environment, the residues A and B can establish a large number of interactions with the solvent, and with other polar or charged residues which stabilise the complex. Thermodynamic integration (TI) and thermodynamic perturbation (TP) have been the most widely used methodologies in the above mentioned free
563
energy simulations. According to the TI formalism [243], the free energy difference (AF) between an initial state R and a final state P is given by, AF-
G-G
-
I 1 0
O/;
x
(7)
where 2, is a coupling parameter, varying from 0 in the initial state to 1 in the final state, and Vx is the potential energy function of the hybrid state specified by ~. Each ensemble average (3Vx/32,),~ is determined by a molecular simulation (MD or MC) performed with the hybrid potential Vx:
v -(1-
)mvR+/zmvp
(8)
In equation (8), VR and Vp are the potential energy functions of the initial and the final states respectively, and m specifies the type of coupling between Vx and 2,. A linear or non-linear coupling approach is adopted depending on m being equal or greater than 1. A finite number of simulations - usually termed 'windows' - is used to change X from its initial to its final value, and the free energy difference in (7) is obtained by numerical integration [244-245]. The efficiency of a TI calculation can be improved using the interpolation method of Brooks [246]. According to the method, an ensemble average in a hybrid state Xi can be calculated using conformations sampled at another hybrid state '~i., which results in the following equation for evaluating the particular ensemble average (3V~/32,)~i,
-Vxi)])xi,
(~--~~exp[/3 (V;Li,
(9)
(exPLB (Vx~.-Vxi)])~.
[3 = 1/keT, where k8 is the Boltzman constant and T is the absolute temperature. The important consequence is that the number of windows in a TI calculation can be significantly reduced by performing only a limited number of simulations with the potentials V~i, and interpolating the ensemble averages (3V,~/3~)x,. for the remainder hybrid states ~,i with equation (9) [119, 121].
564
The other referred option, the TP formalism [247], is usually associated with the definition of a series of M hybrid states/~i, for which M-1 free energy differences AFxi~.~+~ are considered, each being calculated as, I exp[- fl(V~i+, - V x i )1)~i'
AF)~i~)~i+ ' = F)~i+1 - Fxi - - k B T l n
(10)
between hybrid states/~i+~ and ~.. Accordingly, M-1 simulations (windows) are performed using the corresponding hybrid potential energy V,~.to determine the exponential
ensemble
average
(exp[- [3(V,~i+1 -V,~ i )])Zi"
Finally,
the
AF/~;~X;+~ contributions are added together to obtain the total free energy difference (AF): M-1
A F - Fp - F R -
AF/~i --->/~i+1
Z
'
(11)
i=l
The number of windows can be significantly reduced when a double-wide sampling [248] is adopted: a molecular simulation is carried out for each hybrid state/~i and the free energy variations to the previous ~i-1 and to the next/~i+1 states are evaluated using the same produced trajectory. A F is then calculated as: L
A F - Fp - F R - E ( A F 2 . ~ + I - A F 2 . ~ 2 . _ , )
(12)
i=1
Regardless the formalism (TI or TP), end-point regions - ~ ~ 0 or ~, ~ 1 bear specific convergence problems of the ensemble average thermodynamic quantities, whenever some atoms are annihilated [103]. As the potential energy of these atoms goes to zero, they feel almost null forces and have abnormally large displacements, which can originate bad van der Waals and electrostatic contacts [103, 249-254] along with a substantial increase on the harmonic bonded energy components [244, 255-258]. Several methodologies have been introduced to avoid the inherent singularities, sometimes referred as the X goes to end-points catastrophe p r o b l e m [244, 249-252, 256-258]. The numerical instability associated with the non-bonded terms can be partially avoided using a non-linear coupling between Vx and /~, as suggested by Mezei and Beveridge [249], Cross [250] and Mezei [251]. An alternative, more general and stable method proposed by Beuler et al. [252], combines a non-linear coupling
565
approach with a soft-core potential energy function that prevents the undesirable collisions with the annihilated atoms. Pearlman [244] and Pearlman and Kollman [256] developed a specific methodology to prevent the singularity problem inherent to a mutation involving changes in bond lengths. Within this methodology, the free energy associated with these changes is calculated with the corresponding bond lengths constrained, and then a potential mean force (PMF) correction is introduced to account for the imposed constraints. In order to handle with both bond angles and bond lengths, a more general approach has been suggested by Boresch and Karplus [257, 258] involving the introduction of vibrational, PMF-type and Jacobian factor corrective terms for the two components. Although no general formalism exists to solve the end-point singularities, an appropriate conjunction of the above discussed methods can provide a reasonable solution to the problem. Both TP and T! methods have been observed to produce comparable results [103, 121, 237-238, 259-262], despite the fact that each has specific convergence problems associated with its particular implementation. Within the TP formalism, the ensemble averages of type (10), although dependent on both hybrid states 2~i and ~+1, are calculated using only the potential V~i ; this can originate a very slow convergence for the thermodynamic quantities involved if the two states do not overlap well [103,244, 256, 257,259, 263]. The TI method has no such problem because the average quantities of type (7) are evaluated for the same hybrid state at which the molecular simulation is carried out [244, 245, 257-259]. It has, however, specific precision problems since it involves numerical integration for a finite number of points, and presents far more difficulties to sample the end-point regions than the TP formalism [103]. A major TI advantage is that it allows for a free energy partition into additive components, known as free energy component analysis, which has been extensively used to interpret free energy changes at molecular level [108, 118121,240, 264-266]. These components depend on the simulation path between the initial and the final states [267-273], thus care must be taken in the interpretation of their physical meaning.
3.4. Approximate Free Energy Simulations The rigorous calculation of relative binding free energies (AAF) is computationally very expensive, because it involves the simulation of a series of intermediate hybrid states with an appropriate configuration sampling for each. This major drawback gave rise to the development of a new generation of efficient, less accurate but faster methods for estimating binding affinities, which incorporate MD or MC sampling but only for the 'relevant' states of the system.
566
One method for predicting binding free energies is the linear interaction approximation (LIA) introduced by Aqvist et al. [274], which is based on a linear response theory. For a ligand binding to an enzyme, molecular simulations are performed for two separated situations, the free ligand in solution and the enzyme-ligand bound complex in solution. Subsequently, AF is obtained from the ensemble average variations of the ligand-surroundings (L-S) interaction energies - electrostatic and van der Waals - between the unbound (free) and bound states:
In equation (13), a and 13 are empirical scaling parameters derived by fitting experimental data for a series of inhibitors [274-276]; ub and b refer to the unbound and bound states
esoectively, and each
being
or
elec) is given by:
The advantages of LIA over the classical (TP and TI) free energy simulation methods are (i) the 'absolute' binding free energies are computed (AF instead of AAF), (ii) the intermediate hybrid states are eliminated and only the end-points are sampled, and (iii) a physical path is used between the initial and the final states which prevents the 2 goes to end-points catastrophe problem. However, a certain degree of empiricism is associated with the model, and in fact, somewhat different values for the scaling parameters a and fl have been derived by various authors [274-276]. An extension of the LIA formalism was proposed by Jorgensen and coworkers [277,278], involving the introduction of an additional term proportional to the solute's solvent accessible area, and a generalisation of the model was latter introduced by Hansson et al. [279]. Muegge et al. [280, 281] have introduced a new technique, combining the LIA formalism and the proteindipole Langevin-dipole continuum electrostatic model, which allows for a fast estimation of group electrostatic contributions to the binding free energies. The LIA-like methods have been used extensively to study the binding of a series of similar ligands to common target enzymes, such as endothiapepsin [274, 279], cytochrome P450cam [276], thrombin [282, 283], HIV protease [284], trypsin [285] and dihydrofolate reductase [286]. Also providing free energy estimates, the free energy derivatives formalism [287-289] is based on the following equation:
567
AF=
-~a a
Aa,
(15)
O~~
where cz is an atomic parameter (charge, van der Waals parameters, etc.) with a reference value o~0, and Ao~=a-a0. Equation (15) can be used to calculate free energies derivatives at each atom of an inhibitor, in both the unbound and bound states, providing very useful information for guiding the design of new specific ligands [290]. An interesting alternative is the pictorial representation of free energy components (PROFEC) proposed by Radmer and Kollman [126], which generates a free energy grid that gives indications on how ligand modifications affect its binding to an enzyme. A successful application is reported by the same authors for the system trypsin-benzamidine [126]. For the purpose of estimating relative binding free energies of two different ligands (L and L') to the same target enzyme (E), Liu et al. [291] have suggested a perturbational methodology involving a common non-physical reference ligand (RL). To build the single reference ligand, soft core interaction sites are introduced at all the possible positions in which atoms can be created or deleted. This is done by using the modified non-bonded energy potential form proposed by Beuler et al. [252] to describe the interaction of these sites with the surroundings. Since the ligands above referred are very similar (RL -- L, RL -~ L'), the associated free energy differences can be calculated with only one simulation on the correspondent reference states (RL or E-RL). A similar type of methodology was used by Radmer and Kollman for calculating binding affinity differences between substimted-benzamidines bound to trypsin [126]. Another interesting example is the study on the affinities of a series of small aromatic ligands to a hydrophobic cavity in T4-1ysozyme [292]. Also worth mentioning is the )~-dynamics method introduced by Kong, Guo and Brooks [113, 293-294]. It uses a molecular simulated model where different ligands simultaneously compete for a common enzyme, and no interaction is considered between their characteristic atoms; the resulting trajectories enable the calculation of the corresponding relative binding free energies. This method has been used to study the competitive binding of a set of benzamidine derivatives with trypsin [ 113,294].
568
4. A QUANTUM INSIGHT INTO THE STUDY OF E N Z Y M E - L I G A N D INTERACTIONS Two distinct general approaches are possible in the area of modelling enzymatic systems using quantum mechanical (QM) methods. The first consists in studying the system, or part of it, isolated in the gas phase. The second focuses on the study of the system including solvation. The inclusion of the solvent surrounding would be desirable in all cases to obtain reliable results for reactions in enzymes active sites. However, the most reliable methods are computationally expensive and the direct application of e.g. ab initio QM methods to model condensed-phase systems is presently impractical. Accurate ab initio methods [295] are only viable in the study of small to medium-sized molecules whereas density functional theory [296] and semiempirical methods [297] although computationally more versatile are still very much limited considering the fact that we are dealing with thousands of atoms. However, in any approach to studying a particular subject, the quality of the model used should be paramount in the mind of the researcher and therefore any method that could succeed in providing a computationally feasible shape of reality is worth investigating. Quantum mechanics fundamentally involves computing the electron distribution in a molecule providing, as a result, a mathematical description of molecular structure in terms of electrons and nuclei. Therefore, some computational problems e.g. molecular properties are best addressed by quantum mechanics. QM methods often succeed in studying enzyme-ligand interactions through the use of small systems that mimic the real ones as will be discussed later. QM calculations are also invaluable in the determination of force field parameters and thus indirectly become central to the study of enzyme-ligand interactions. Small reference molecules, such as aminoacids, have their force constants for bond, bond angle and torsion angle terms determined quantum mechanically as well as parameters for Lennard-Jones terms and atomic partial charges [298]. Then large molecules beyond the reach of quantum mechanics can be modelled by molecular mechanics and/or molecular dynamics. Additionally, semiempirical treatments of solvated systems can profit from accurate vacuum QM calculations as far as their own calibration is concerned. The binding free energy of the inhibitor to an enzyme is a crucial quantity. Free energy simulation approaches [247] often have to use QM calculations to complete thermodynamic cycles which produce the referred binding free energies thus catapulting quantum mechanics to the limelight. Again, although in an indirect way, QM calculations are essential as a means to model enzymeligand interactions.
569
Like with any other method, the desired quality of the results has to be balanced with the cost of the calculation and, accordingly, a choice must be made. Generally the more cpu time you can afford to spend, the more reliable will be the results. As computer power grows, QM methods will grow more powerful as a means to study enzyme-ligand interactions. An alternative way to examining a small part of a system accurately is to study the complete system approximately; by complete system we mean the enzymatic system constituted by the enzyme and ligand as well as the surrounding solvent. The development of hybrid QM/CM approaches [299] was a step forward in the study of enzymatic systems enabling the more accurate quantum mechanical treatment of the region surrounding the active centre and the classical description of the remainder. Hybrid QM/CM potentials have received increased attention in the past 10 years and have been extensively used to include the explicit description of the solvent during the study of enzymeligand interactions. There are presently several of these methods available which include quantum mechanical/molecular mechanical (QM/MM) and quantum mechanical/continuum dielectric (QM/CD) models, ranging from the combination of ab initio Hartree-Fock [300, 301] valence bond [303-306, 349] density functional [307-309] and semiempirical [238, 302, 310-319] methods with well known force fields [39, 51, 88, 180, 320] or continuum models [321325] which have been able to achieve interesting results on calculations related to enzymatic systems. The reliability of QM/CM calculations depends again on the inverse relationship between the accuracy of the QM method and its computational feasibility. However, we are not attempting to review any of these methods as this falls outside the scope of this chapter. Instead, we seek to introduce the reader to some recent studies on enzyme-ligand interactions which use QM and hybrid QM/MM or QM/CD methods.
4.1. Quantum Mechanical Methods 'Solving' the Schr6dinger equation for the entire solvated enzyme-ligand system is presently an impossibility even when semiempirical methods are concerned. However, quantum mechanical approaches are invaluable in the contribution to the modelling of enzyme-ligand interactions, either indirectly through the parameterisation of appropriate force fields, or more directly in studies such as the calculation of the behaviour of isolated reacting fragments of the enzyme-ligand complex. Potential energy functions are central to several standard methods in computational chemistry such as molecular mechanics and dynamics. Parameterisation of a force field is a difficult and laborious process and often parameters are determined by fitting calculated results to experimental data. This data is in many cases scarce and even when available, the right selection
570
from which to parameterise the force field can become a difficult task [69].To overcome this fact, force fields such as MMFF93 [326] and CFF91 [327] were developed, based on a large amount of high quality quantum mechanical calculations including electron correlation [328]. Interestingly enough, MMFF93 (Cerius package [329]) and CFF91 (Insight/Discover [329]) seem to be among the best force fields [298]. Certainly force fields such as AMBER [51], CHARMM [39] and GROMOS [88], specially designed for application to biomolecules, rely on the results of ab initio calculations performed on small model systems to supplement the available experimental data. These force fields and, through them ab initio calculations, have become invaluable in the study of enzyme-ligand interactions. Focusing on the interaction of isolated fragments of the enzyme-ligand complex is probably the simplest approach to its study. There are several of these examples in the literature ranging from the interaction between simple aminoacids to more complicated attempts such as the search for an enzymatic mechanism. One example of the former are salt bridge interactions between Asp or Glu and Arg which have been extensively studied in the gas phase, in solution and in the interior of proteins [238, 330-333]. These interactions are extremely important in proteins and, among other functions, act as binding sites e.g. in the inhibition of blood coagulation enzyme ~-factor XIIa [334]. To study the stability of salt bridges, small models have been used such as formateguanidinium or acetate-methylguanidinium. Zheng and Omstein [330], Barril et al. [331] and Ramos et al. [238, 332, 333] have performed quantum mechanical calculations mostly at SCF/RHF and MP2 levels with sophisticated basis sets, and the common conclusion is that the environment definitely has a large influence on whether the neutral or the zwitterionic forms prevail. The first is well established in the gas phase, but the polarity of the solvent or the proteic environment are determinant in the occurrence of one or the other form. Lluch and co-workers [335] have used the standard semiempirical method AM1 [336] to focus on the hydrogen bond network influence on the carbonic anhydrase II (CAII) zinc binding site. For this purpose they have built a reduced quantum mechanical model of wild-type CAII consisting of the zinc ion coordinated to a water molecule and three histidines, and the second shell of residues that hydrogen bond to the zinc ligands - threonine, asparagine glutamine and glutamate, respectively. They have modelled also some variants of CAII by replacing some of the wild-type enzyme indirect ligands with other amino acid side chains. With this study they have shown that the hydrogen bond network, in which the direct zinc ligands are nested in CAII, modulate the zinc binding affinity and the zinc-water pKg. The results that they have obtained with these gas-phase models, although not quantitatively comparable to the
571
experimental quantities obtained in solution, were able to qualitatively reproduce the main trends of the different ligands in a real hydrogen-bond network. Siegbahn [337] has proposed a new, six step substrate mechanism for ribonucleotide reductase (RNR): based on Density Functional Theory (DFT) calculations using the hybrid functional B3LYP with large basis sets. He has built suitable models of the complex enzyme-substrate based on recently determined X-ray structures which he has used in his calculations to investigate different mechanisms of the steps leading from a ribonucleotide to a deoxyribonucleotide. Different basis sets were used in the calculations to ensure that any inaccuracies in the results should be associated with the chemical models used or the B3LYP functional [338]. His studies have led him to conclude that dielectric effects from the surrounding protein are very small and of almost no importance in the process. These dielectric effects were obtained using the self-consistent reaction field method (see the following section on Hybrid QM/CM Methods) as implemented in the Gaussian-94 program [339]. Himo and Eriksson [340] have studied the catalytic mechanism of pyruvate formate-lyase (PFL) involving many reactions which, in fact, resemble reactions associated with the substrate mechanism of RNR. They have performed DFT calculations with the hybrid functional B3LYP; geometry optimisations were carried out with the triple-~ basis set 6-311G(d,p), followed by single point energy calculations with the larger basis set 6-311+G(2d,2p). Small chemical models were used to represent the enzyme, with the authors having found that their calculations generally support the homolytic radical mechanism proposed by Kozarich and co-workers [341 ]. Topol et al. [342] have applied DFT methods to find out the structural and electronic parameters in several classes of zinc-binding domains belonging to three distinct zinc finger families. They have found good agreement between their results and the corresponding experimental values for related systems. Based on their calculations they conclude that geometric and electronic properties of the metal-binding sites located in zinc fingers can be accurately determined without taking into account the influence of the environment outside the metal coordination sphere. In their paper they discuss the implications of their calculations, relative to binding affinities and ionisation potentials, for the structure and function of zinc fingers as well as for the design of zinc fingerinactivating agents. Other DFT-B3LYP calculations were performed by Siegbahn and coworkers [343], this time on a neutral model system L3Cu ... Cu L3 to probe the mechanism of tyrosinase action. The ligands L chosen to model histidines, were either ammonia or formimine. The authors focus on the choice of chemical model and its limitations, the location of the transition state for O-O activation
572
of 0 2 for both sets of ligands, and the phenol oxidation of the tyrosinase reaction sequence Mulholland and Richards [344-346] have carried out ab initio (MP2/631+G(d) and RHF/6-31+G(d)) and semiempirical (AM1, PM3 and MNDO) molecular orbital calculations focussing on the enzyme citrate synthase. Their calculations were performed on the first stage of the citrate synthase reaction [344], on the substrate oxaloacetate [345] and on a simple model of the condensation reaction [346]. Their aim was to model the nucleophilic intermediate produced by the rate-limiting step, to examine which form of acetyl-CoA is the likely intermediate and how it is stabilised by the enzyme. They have found that the enolate is the likely nucleophilic intermediate in citrate synthase being stabilised by hydrogen bonds.
4.2. Hybrid QM/CM Methods The study of enzyme-ligand interactions in solution has two main problems associated with it: (i) the huge number of particles which compose the entire system and (ii) its complexity. The resulting difficulties are immense, ranging from the complexity of the calculations, to the impractibility of the more accurate theoretical treatments such as QM, or the impossibility of application of others such as MM to follow chemical reactions. QM/CM methods were first introduced by Warshel and Levitt [299] in 1976, and generally tend to limit the quantum mechanical description of the system to the reaction centre while using a classical treatment for the rest of it [310, 321-323, 349-360]. As previously mentioned, the QM region is represented by nuclei and electrons and has been solved with ab initio HartreeFock, valence bond, density functional, and semiempirical descriptions; on the other hand the CM region uses computationally efficient models of which several have been cited in the literature ranging from continuum to molecular mechanics calculations. QM/CM calculations have become definitely the most exciting class of methods to model enzyme-ligand interactions in condensedphase systems. ,~qvist and Warshel [304] have produced an excellent review on the simulation of enzyme reactions using valence bond force fields and other hybrid quantum/classical approaches. Several combinations of QM approaches with a continuum dielectric model [175, 325, 360-369] have been focused upon. Continuum solvation models have their origin in the work of Onsager [325] for describing ions in solution; these models have shown flexibility and accuracy enough to become a popular tool nowadays. Generally in these methods the solute is placed inside a cavity with appropriate shape, made in a continuous medium characterised by a dielectric constant. The electronic distribution of the solute induces a charge density at the surface of the cavity which creates a field that modifies the energy
573
and properties of the solute. This reaction field effect is solved iteratively in the SCF method by the inclusion of a supplementary potential term in the solute Hamiltonian. Continuum models have the advantage of effectively modelling long-range electrostatic contributions in solution; however, they do present a major drawback for the modelling of enzyme-ligand interactions by disregarding structural details of the environment. Treating the protein atoms as well as the surrounding solvent molecules explicitly constitutes the main alternative to continuum models. This obviously incurs a much heavier computational cost and simplified solvent models have been introduced to reduce it. The Langevin dipole (LD) model [299] is such an example; here each molecule is represented by a polarisable point dipole, assumed to obey the Langevin polarisation law, located on a 3D grid with a cubic unit shell. This model has been progressively improved and recently used in free energy simulation calculations [370, 371]. Combined QM and MM potentials [372] have received increased attention over the last few years. The Hamiltonian of the system differs from a purely quantum mechanical one and focuses generally on three different terms H-H ~ +H h+Hh
(16)
where H~ is the vaccuum Hamiltonian for the quantum mechanical region, H h is the energy arising from the interactions which fall within the outer region, including the boundary, and finally H hdescribes the interactions between a and b. Figure 3 schematically represents a condensed phase system in which all three regions have been pointed out.
574
Figure 3. Representation of a condensed phase system
Unfortunately QM/MM potentials are not devoid of problems. The most severe ones are probably the division of covalent bonds across the QM and MM regions and the lack of explicit polarisation of the MM approach. The first of these two difficulties has been looked at by several groups who have proposed different schemes to deal with the problem: Warshel and Levitt [299] have used a single hybrid orbital on the ivIM atom in the QM/MM region; a similar approach has been proposed subsequently by Rivail and co-workers [312, 355, 373] with their frozen orbital (or excluded orbital) in which the continuity between the two critical regions is assured by a strictly localised bond orbital (SLBO) obtained from model compounds. Another popular approach introduces link atoms [300, 310, 315] between QM and MM covalently bonded atoms to cap the valency of the QM atoms; the link atoms, usually hydrogen, do not interact with the MM atoms. These are not, by any means, the only ways of dealing with this problem. However, so far it does not seem to have an obvious solution. As pointed out before, electronic polarisation also presents a challenge. Basically this is because the QM region is polarised while the MM region lacks in explicit polarisation. Efforts to incorporate a polarised MM model into a
575
QM/MM calculation do exist in the literature [374-377], increasing the expense of the MM calculation. In the empirical valence bond (EVB) model [304, 349, 370] a fairly small number of VB functions is used to fit a VB model of a chemical reaction path; the parameterisation of these functions is carried out to reproduce experimental or ab initio MO data. The simple EVB Hamiltonian thus calibrated for a model reaction in solution can subsequently be used in the description of the enzymeligand complex. One of the most ingenious attributes of the EVB model is that the reduction of the number of VB resonance structures included in the model does not introduce serious errors, as would happen in an ab initio VB formulation, due to the parameterisation of the VB framework which ensures the reproduction of the experimental or other information used. This computationally efficient approach has been extensively used with remarkable success [305, 306, 371,379] A similar method presented by Kim and Hymes [380] considers a non-equilibrium coupling between the solute and the solvent, the latter being treated as a dielectric continuum. Despite all the problems inherent to QM/CM approaches, some extremely interesting and perceptive work has been described in the literature recently in which all sorts of approaches have been used, improvements introduced and results obtained ([351, 372] and references therein). The study of enzyme catalysed reaction mechanisms, the calculation of relative binding free energies of substrates and inhibitor, and the determination of proton transfer processes in enzymatic reactions, are all good examples of enzyme-ligand interactions studies. Even though Warshel's EVB method [349] probably remains the most practical QM/CM approach for the study of enzyme catalysis, very useful work has been reported on enzyme catalysed reactions ([381] for an excellent review[238, 319, 382-384]). This is a consequence of the accuracy of QM to treat the active site and inhibitor/substrate and the viability of classical mechanics to model the bulk of the enzyme not directly involved in the chemical reaction. The hydrolysis by thermolysin of formamide, a model of a peptide bond, has been studied by Antonczak et al. [319] confirming a decrease of the activation barriers with the introduction of a second water molecule in the reaction coordinate. This water-assisted process is always favoured compared to the mechanism in which a single water molecule reacts. The authors first start with a quantum study on a small system with only the metal, some model ligands, and the substrate being considered. The influence of the whole enzyme was then taken into account using the QM/MM LSCF method [312]. Slight differences between the reactions catalised by the model complex, and by the whole enzyme, have been encountered and explained on the basis of small geometry distortions induced by the amino acid residues surrounding the active centre.
576
Ramos et al. [238] have also used the QM/MM LSCF method [312] to understand the reasons why the pancreatic trypsin inhibitor, PTI, behaves as an inhibitor of trypsin rather than as a substrate. In fact, PTI places a peptidic bond between a lysine and an alanine in the catalytic triad of trypsin with the side chain of the lysine in the binding pocket of the enzyme, exactly as a cleavable peptide would do, and this provokes no reaction between PTI and trypsin. The QM/MM calculations performed show that the geometry adopted by the active site of the enzyme in the complex is such that it prevents the nucleophilic attack of the hydroxyl oxygen on the peptide bond of PTI. Hillier and co-workers [385] performed both high-level electronic structure and semiempirical MO/molecular mechanical calculations to elucidate several key features of the catalytic pathway of the enzyme papain. The AMBER [51] force field was used to carry out the MM calculations and Gaussian94 [339] enabled both semiempirical (AM1) and DFT (B3LYP/321G*) QM calculations to be performed. The results obtained with the hybrid calculations compare well with the relevant experimental findings having answered some important questions regarding the catalytic mechanism and thus highlighting the usefulness of QM/MM calculations to the chemistry of large complex biomolecules. Through ab initio quantum mechanical/molecular dynamics methods, Per~ikyl~i and Kollman [386] have presented a detailed model of the mechanism of aspartylglucosaminidase-catalysed cleavage of an amide bond. The QM model consisting of the model substrate, N-terminal amino acid, and oxyanion hole was considered large enough to include all the important interactions present at the active site between the enzyme and the scissile amide bond. The QM calculations have been performed at the MP2/6-31G*//HF/6-31 G* level and MD simulations for all the intermediates and transition states of the reaction were carried out. A theoretical study based on MP2/6-3 l+G(d,p) and HF/6-31G(d) ab initio quantum mechanical calculations coupled with Langevin dipoles (LD) and polarised continuum (PCM) solvation models have been carried out by Florifin and Warshel [387] to achieve a first systematic study of the free energy surfaces for the hydrolysis of methylphosphate in aqueous solution. The important biological implication of this work is the fact that since the energetics of both the associative and the dissociative mechanics are not too different, the active sites of enzymes can select either mechanism depending on the particular electrostatic environment. This conclusion basically means that both mechanisms should be considered, and this fact seems to contradict some previous studies which have focused on phosphoryl transfer reactions. Using high-level ab initio (HF/6-31++G(d,p)) and DFT (BLYP and B3LYP/6-3 l++G8d,p)) calculations, and self-consistent reaction field
577
simulations, Pan and McAllister [388] have addressed the problem of "shortstrong" (SSHB) or "low-barrier" (LBHB) hydrogen bonds which has drawn a great deal of interest in recent years. By using a system model of a formic acid molecule and a formate anion they have shown that the SSHB between these two species is significantly weakened in the presence of an extremely polar cavity. Yet, despite being weakened, this hydrogen bond was by no means extinct. This led the authors to conclude that even if the environment of an enzyme active site was as polar as aqueous water, the formation of such an SSHB would still be more favourable than a traditional weak hydrogen bond like the one formed between two formic acid molecules. The implication of these results is that the literature-proposed mechanism for enzyme catalysis, in which an enzyme-bound intermediate is stabilised by an LBHB, cannot be ruled out on accounts of cavity polarity effects alone. Recently, Kollman and co-workers [301] proposed a new ab initio/free energy (QM-FE) approach to study enzyme-catalysed reactions and the corresponding solution reaction. This approach has been applied specifically to the formation of the tetrahedral intermediate in trypsin, but it is thought that it should be generally applicable to a wide variety of enzyme systems and biomimetic models. The RESP methodology [389] has been applied here to generate charges for the quantum mechanical atoms in order to calculate their interaction with the molecular mechanical atoms, a general concern in this type of calculation. Reaction intermediates and transition state modelling can be realistically studied with the inclusion of the protein environment. This issue has been addressed by Mulholland and Richards ([390] and references therein) for the citrate synthase enzyme. While transition state and stable intermediate structures have been optimised at semiempirical and ab initio levels, these authors have also studied the reaction in the enzyme both with AM1 QM/MM and with ab initio QM/MM calculations, which have indicated that the enolate of acetyl-CoA is the likely nucleophilic intermediate in citrate synthase, in agreement with the gas-phase findings [346]. One of the conclusions from the paper is the fact that semiempirical QM/MM simulations are likely to be able to provide further useful understanding of enzyme reactions. Tawa et al. [391] have calculated the relative binding free energies of peptidic inhibitors to HIV-1 protease [391-394] and its I84V mutant which has provided some useful insights into the nature of drug resistance. They have introduced a methodology which is a combination of semiempirical MNDO/H [395] to find out the protonation state of important residues in the enzyme active site, molecular force field AMBER [51] to determine the gas-phase energetic contributions and dielectric continuum solvation to calculate electrostatic hydration contributions, to the relative binding free energy. All of the results are
578
within about 1 kcal/mol of experiment. It was found that only the inclusion of solvent effects provided the quantitatively accurate relative binding free energies given a particular active-site protonation state of aspartic acid residues of HIV-1 protease. Still on the HIV-1 protease, Chatfield et al [393] have published a theoretical study of its cleavage mechanism. They have used hybrid QM/MM potentials to distinguish between two mechanisms consistent with classical MD simulations. Although transition-state geometries and barrier heights for the initial reaction steps of the two mechanisms were identified by preliminary calculations at the HF/STO-3G level, these authors have found that complete energy minimisation at the STO-3G level is very time-consuming; they therefore opted for using the AM1 potential for energy minimisation which in fact generated good results providing confidence in their strategy. One conclusion that seems to be common is the fact that QM/CM methodologies are indeed very well suited to these kind of studies and should become increasingly more used in the future where biochemical problems are concerned. Ab initio QM/CM is computationally very demanding even though there are increasingly more reported works in the literature; accordingly, empirical valence bond and semiempirical QM/CM methods remain attractive alternatives to model enzyme-ligand interactions.
5. CONCLUSIONS Despite the incredibly fast development in computer power that the world has witnessed in the last decade, modelling enzyme-ligand interactions is still a challenging exercise. However, molecular modelling techniques nowadays are considered invaluable tools to provide information not readily available from experiments on enzyme-ligand complexes. Accordingly, the literature contains many interesting studies concerning such interactions. We have given an overview of modelling techniques, qualitative and quantitative, suitable for the determination of enzyme-ligand interactions. A wide range of programs are available nowadays for the qualitative modelling of such quantities. Examples of the techniques employed involve the use of interactive molecular graphics, homology modelling and docking. The information provided as a result of this type of studies can be very important, revealing structural details which are often difficult to otherwise ascertain. Quantitative calculations can provide accurate information on enzymeligand interactions, and usually the main aim of the researcher is the determination of the energy profile of the associated reactions. A good description of the system must take into account the environment; hybrid
579
QM/CM methods are able to do such a thing and they are probably the most promising techniques presently available. However, there are problems associated with any chosen technique and therefore one should be always critical of the results obtained.
ACKNOWLEDGEMENTS A research grant Praxis XXI/BD/5103/95 to ESH acknowledged to "Fundaqfio para a Ciencia e Tecnologia".
is gratefully
REFERENCES
.
.
.
5. .
7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18.
T. I. Oprea and G. R. Marshall, Prespect. Drug Discov. Design, 9/10/11 (1998) 35. T. I. Oprea and C.L. Waller, K. B. Lipkowitz and D.B. Boyd (eds.), Rev. Comput. Chem., vol. 11,127-240, Wiley-VCH, NY, 1997. T. P. Lybrand, G. Naray-Szab6 and A. Warshel (eds.), Computational Approaches to Biochemical Reactivity, 363- 374, Kluwer Academic Publishers, the Netherlands, 1997. M. Hahn, J. Med. Chem., 38 (1995) 2080. A. Vedani, P. Zbinden, J.P. Snyder and P.A. Greenidge, J. Am. Chem. Soc., 117 (1995) 4987. J. M. Thomas, Nature, 364 (i993) 478. S. S. Hall, Science, 267 (1995) 620. K. Wtithrich, Acta Chrystallogr., D51 (1995) 249. M. B illeter, Prespect. Drug Discov. Design, 3 (1995) 151. A. M. Gronenborn and G. M. Clore, Crit. Rev. B iochem. Mol. Biol., 30 (1995) 351. J. C. Hogan, Nature, 384 Suppl. (1996) 23. R. C. Wade, A. R. Ortiz and F. Gago, Prespect. Drug Discov. Design, 9/10/11 (1998) 19. R. D. Head, M. L. Smythe, T. I. Oprea, C. L.Waller, S. M. Green and G. R. Marshall, J. Am. Chem. Soc., 118 (1996)3959. P. A. Kollman, Curr. Opin. Struct. Biol, 4 (1994) 240. D. E. Clarck, C. W. Murray and J. Li, K.B. Lipkowitz and D.B. Boyd (eds.), Rev. Comput. Chem., vol. 11, 67-125, Wiley-VCH, NY, 1997. J. Moult, J. T. Peterson, R. Judson and K. Fidelis, Proteins, 23 (1995) 2. S. Mosimann, R. Meleshko and R. James, Proteins, 23 (1995) 301. W. R. Perason, Meth. Enzym., 183 (1990) 63.
580
19. S. F. Altschul, W. Gish, W. Miller, E.W. Myer and D.J. Lipman, J.Mol. Biol., 215 (1990) 403. 20. M. Sippl, Proteins, 17 (1993) 355. 21. R. Rodriguez and G.Vriend (1998), website article at http://swift.emblheidelberg.de/future/articles/text/gambling.html. 22. U. Sreenivasan and P.H. Axelsen, Biochemistry, 31 (1992) 12785. 23. E. F. Henriques, M. J. Ramos and C. A. Reynolds, J. Comput.-Aided Mol. Design, 11 (1997) 547. 24. E. S. Henriques, M. J. Ramos, W. Floriano, J. A. N. F. Gomes, B. Maigret, A. Melo, M. C. Nascimento and N. Reuter, results to be published. 25. G. Vriend, J.Mol.Graph., 8 (1990)52. 26. The QUANTA (tm) Program (c) 1986,1999 Molecular Simulations Inc. 27. H . E . Dayringer, A. Tramontano and R. J. Fletterick, J.Mol.Graph., 4 (1986) 82. 28. R. A. Laskowski, M. W. MacArthur, D. S. Moss and J. M. Thornton, J.Appl.Cryst., 26 (1993) 283. 29. J. B. Moon and W. J. Howe, Proteins: Struct.,Funct.,Genet.,11 (1991) 314. 30. A. R. Sielicki, K. Hayakawa, F. Fajinaga, M. E. P. Murphy, M. Fraser, A. K. Muir, C. T. Carilli, J. A. Lewicki, J. D. Baxter and M. N. G. James, Science, 243 (1989) 1346. 31. W. Carlson, M. Karplus and E. Haber, Hypertension, 7 (1985) 13. 32. P. J. Goodford, J. Med. Chem., 28 (1985) 849. 33. R. C. Wade, K. J. Clark and P. J. Goodford, J. Med. Chem., 36 (1993) 140. 34. A. Miranker and M Karplus, Proteins: Struct., Funct., Genet., 11 (1991) 29. 35. D. J. Danziger and P. M. Dean, Proc. R. Soc. London, B236 (1989) 115. 36. H.-J. B6hm, J. Comput.-Aided Mol. Design, 6 (1992) 61. 37. M. A. Murcko, K. B. Lipkowitz and D .B. Boyd (eds.), Rev. Comput. Chem., vol. 11, 1-66, Wiley-VCH, NY, 1997. 38. K. Appelt, R. J. Bacquet, C. A. Barlett, C. L. J. Booth, S. T. Freer, M. A. Fuhry, M. R. Gehring, S. M. Hermann, E. F. Howland, C. A Janson, T. R. Jones, C.-C. Kan, V. Kathardekar, K. K. Lewis, G. P. Marzoni, D. A Matthews, C. Mohor, E. W. Moomaw, C. A. Morse, S. J. Oatley, R. C. Ogden, M. R. Reddy, S. H. Reich, W. S. Schoettlin, W. W. Smith, M. D. Varney, J. E. Villafranca, R, W, Ward, S. Webber, S. E. Webber, K. Welsh and J. White, J. Med. Chem., 34 (1991) 1925. 39. A. D. MacKerell Jr., D. Bashford, M. Bellott, R. L. Dunbrack Jr., J. Evanseck, M. J. Field, S. Fischer, J. Gao, H. Guo, S. Ha, D. Joseph, L. Kuchnir, K. Kuczera, F. T. K. Lau, C. Mattos, S. Michnick, T. Ngo, D. T. Nguyen, B. Prodhom, W. E. Reiher, III, B. Roux, M. Schlenkrich, J. Smith,
581
40. 41. 42. 43. 44. 45. 46. 47. 48.
49. 50. 51.
52. 53. 54. 55. 56. 57. 58. 59. 60. 61. 62. 63.
R. Stote, J. Straub, M. Watanabe, J. Wiorkiewicz-Kuczera, D. Yin, and M. Karplus, J. Phys. Chem. B, 102 (1998) 3586. A. Caflisch, A. Miranker and M. Karplus, J. Med. Chem., 36 (1993) 2142. C. S. Poornima and P. M. Dean, J. Comput.-Aided Mol. Design, 9 (1995) 500. C. S. Poornima and P. M. Dean, J. Comput.-Aided Mol. Design, 9 (1995) 513. C. S. Poornima and P. M. Dean, J. Comput.-Aided Mol. Design, 9 (1995) 521. M. T. Pisabarro, A. R. Ortiz, A. Palomer, F. Cabrd, L. Garcia, R. C. Wade, F. Gago, D. Maule6n and G. Carganico, J. Med. Chem., 37 (1994) 337. G. Lauri and P. A. Bartlett, J. Comput.-Aided Mol. Design, 8 (1994) 51. M. B. Eisen, D. C. Wiley, M. Karplus and R. E. Hubbard, Proteins: Struct., Funct., Genet., 19 (1994) 199. A. Caflisch and M. Karplus, Prespect. Drug Discov. Design, 3 (1995) 51. D. E. Clark, D. Frenkel, S. A. Levy, J. Li, C. W. Murray, B. Robson. B. Waszkowycz and D. R. Westhead, J. Comput.-Aided Mol. Design, 9 (1995) 13. V. Tschinke and N. C. Cohen, J. Med. Chem., 36 (1993) 3863. R. S. Bohacek and C. McMartin, J. Am. Chem Soc., 116 (1994) 5560. W. D. Cornell, P. Cieplak, C. I. Bayly, I. R. Gould, K. M. Merz Jr., D. M. Ferguson, D. C. Spellmayer, T. Fox, J. W. Caldwell and P. A. Kollman, J. Am. Chem. Soc., 117 (1995) 5179. D. K. Gehlhaar, K. E. Moerder, D. Zicbi, C. J. Sherman, R. C. Ogden and S. T. Freer. J. Med. Chem., 38 (1995) 466. G. Jones and P. Willett, Curt. Opin. Biotechnol., 6 (1995)652. T. Lybrand, Curr. Opin. Struct. Biol., 5 (1995) 224. B. K. Shoichet and I. D. Kuntz, Protein Eng., 6 (1993) 723. E. C. Meng, D. A. Gschwend, J. M. Blaney and I. D. Kuntz, Proteins, 7 (1993) 266. T. J. A. Ewing and I. D. Kuntz, J. Comput. Chem., 18 (1997) 1175. B. K. Shoichet, R. M. Stroud, D. V. Santi, I. D. Kuntz and K. M. Perry, Science, 259 (1993) 1445. T. N. Hart and R. J. Read, Proteins: Struct., Funct., Genet., 13 (1992) 206. S.-Y. Yue, Protein Eng., 4 (1990) 177. D. S. Goodsell and A. J. Olsen, Proteins: Struct., Funct., Genet., 8 (1990) 195. R. Judson, K. B. Lipkowitz and D.B. Poyd (eds.), Rev. Comput. Chem., vol.10, 1-73, Wiley-VCH, NY, 1997. R. C. Glen and A. W. R. Payne, J. Comput.-Aided Mol. Design, 9 (1995) 181.
582
64. 65. 66. 67. 68. 69. 70. 71. 72. 73. 74. 75. 76. 77. 78. 79. 80. 81. 82. 83. 84. 85. 86. 87. 88. 89. 90. 91. 92.
Ajay and M. A. Murcko, J. Med. Chem., 38 (1995) 4953. N. Froloff, A. Windemuth and B. Honig, Protein Sci., 6 (1997) 1293. H.-J. B6hm,, J. Comput.-Aided Mol. Design, 8 (1994) 243. Q. Xue and E. S. Yeung, Nature, 373 (1995) 681. J. R. H. Tame, J. Comput.-Ai!ded Mol. Design, 13 (1999) 99. M. D. Beachy, D. Chasman, R. B. Murphy, T. A. Halgren and R. A. Friesner, J. Am. Chem. Soc., 119 (1997) 5908. E. S. Henriques, M. Bastos, C. F. G. C. Geraldes and M. J. Ramos, Int. J. Quantum Chem., 73 (1999) 2137. C. E. Sansom, J. Wu and I. T. Weber, Protein Eng., 5 (1992) 659. J. Lautz, H. Kessler, R. Kaptein and W. F. van Gunsteren, J. Comput.Aided Mol. Design, 1 (1988)1219. B. L. Podlogar, R. A. Farr, D. Friedrich, C. Tarnus, E. W. Huber, R. J. Cregge and D. Schirlin, J. Med. Chem., 37 (1994) 368. W. Doster, S. Cusack and W. Petry, Nature, 337 (1989) 754. J. Smith, K. Kuczera and M. Karplus, Proc. Natl. Acad. Sci. USA, 87 (1990) 1601. C. M. Dobson and M. Karplus, Method. Enzymol., 131 (1986) 362. G. Wagner, Q. Rev. Biophys., 16 (1983) 1. H. Frauenfelder, G. A. Petsko and D. Tsernoglou, Nature, 280 (1979) 558. P. J. Artymiuk, C. C. F. Blake, D. E. P. Grace, S. J. Oatley, D. C. Phillips and M. J. E. Sternberg, Nature, 280 (1979) 563. C. K. Woodward and B. D. Hilton, Annu. Rev. Biophys. Bioeng., 8 (1979) 99. J. R. Lakowicz and G. Weber, Biochemistry, 12 (1973) 4171. H. Frauenfelder, F. Parak and R. D. Young, Annu. Rev. B iophys. B iophys. Chem., 17 (1988) 451. E. von Kitzing and E. Schmitt, J. Mol. Struct.(Teochem), 336(1995) 245. J . . . . McCammon and S. C, Harvey, Dynamics of Proteins and Nucleic Acids, Cambridge University Press, Cambridge, 1987. M. Karplus and G. A. Petsko, Nature, 347 (1990) 631. A. A. Rashin, Prog. Biophys. Mol. Biol., 60 (1993) 73. S. Lifson and A. Warshel, J. Chem. Phys., 49 (1968) 5116. J. Hermans, H. J. C. Berendsen, W. F. van Gunsteren and J. P. M. Postma, Biopolymers, 23 (1984) 1513. Y. Sun, J. W. Caldwell and P. A. Kollman, J. Phys. Chem., 99 (1995) 10081 Y. Ding, D. N. Bernardo, K. Krogh-Jespersen and R. M. Levy, J. Phys. Chem., 99 (1995) 11575. S. W. Rick and B. J. Berne, J, Am. Chem. Soc., 118 (1996) 672. R. W. Dixon and P. A. Kollman, J. Comput. Chem., 18 (1997) 1632.
583
93. 94. 95. 96. 97. 98. 99.
100. 101. 102. 103. 104. 105. 106. 107. 108. 109. 110. 111. 112. 113. 114. 115. 116.
P. J. Winn, G. G. Ferenczy and C. A. Reynolds, J. Phys. Chem. A, 101 (1997) 5437. G. G. Ferenczy, P. J. Winn and C. A. Reynolds, J. Phys. Chem. A, 101 (1997) 5446. P. J. Winn, G. G. Ferenczy and C. A. Reynolds,J. Comput. Chem., 20 (1999) 704. C. A. Reynolds, J. W. Essex and W. G. Richards, Chem. Phys. Lett., 199 (1992)257. J. W. Essex, C. A. Reynolds and W. G. Richards, J. Am. Chem. Soc., 114 (1992) 3634. C. A. Reynolds, J. W. Essex and W. G. Richards, J. Am. Chem. Soc., 114 (1992) 9075. W. F. Gunsteren, Computer Simulations of Biomolecular Systems; Theoretical and Experimental Applications, vol. 1, W. F. van Gunsteren and P. K. Weiner (eds.), ESCOM, Leiden, 1989. M. L. Lamb and W. L. Jorgensen, Curt. Opin. Chem. Biol., 1 (1997) 449. W. L. Jorgensen and J. Tirado-Rives, J. Phys. Chem., 100 (1996), 14508. H. Senderowitz and W. C. Still, J. Comput. Chem., 19 (1998) 1736. D. L. Beveridge and F. M. DiCapua, Annu.Rev. Biophys. Biophys. Chem., 18 (1989) 431. H. J. C. Berendsen, J. P. M. Postma, W. F. van Gunsteren, A. DiNola and J. R. Haak, J. Chem. Phys., 81 (1984) 3684. S. Nos6, J. Chem. Phys., 81 (1984) 511. W. G. Hoover, Phys. Rev. A, 31 (1985) 1695. G. J. Martyna, M. L. Klein and M. Tuckerman, J. Chem. Phys., 97 (1992) 2635. D. A. Pearlman and P. R. Connelly, J. Mol. Biol., 248 (1995) 696. G. Liang, R. K. Schmidt, H.-A. Yu, D. A. Cumming and J. W. Brady, J. Phys. Chem., 100 (1996) 2528. T. Fox, T. S. Scanlan and P. A. Kollman, J. Am. Chem. Soc., 119 (1997) 11571. Y. Yan, B. W. Erickson and A. Tropsha, J. Am. Chem. Soc., 117 (1995) 7592. J. Norberg and L. Nilsson, J. Am. Chem. Soc., 117 (1995) 10832. Z. Guo, C. L. Brooks III and X. Kong, J. Phys. Chem. B, 102 (1998) 2032. V. Helms and R. C. Wade, J. Am. Chem. Soc., 120 (1998) 2710. C. M. Soares, P. J. Martel, J. Mendes and M. A. Carrondo, Biophys. J., 74 (1998) 1708. C. A. Sotriffer, W. Flader, A. Cooper, B. M. Rode, D. S. Linthicum, K. R. Liedl and J. M. Varga, Biophys. J., 76 (1999) 2966.
584
117. M. C. Menziani, C. A. Reynolds and W. G. Richards, J. Chem. Soc. Chem. Commun., (1989) 853. 118. J. Gao, K. Kuczera, B. Tidor and M. Karplus, Science, 244 (1989) 1069. 119. B. Tidor and M. Karplus, Biochemistry, 30 (1991) 3217. 20. T. Simonson and A. T. Brtinger, Biochemistry, 31 (1992) 8661. 121. F. T. K. Lau and M. Karplus, J. Mol. Biol., 236 (1994) 1049. 122. T. Simonson, G. Archontis and M. Karplus, J. Phys. Chem. B, 101 (1997) 8349. 123. M. Zacharias, T. P. Straatsma, J. A. McCammon and F. A. Quiocho, Biochemistry, 32 (1993) 7428. 124. J. W. Essex, D. L. Severance, J. Tirado-Rives and W. L. Jorgensen, J. Phys. Chem. B, 101 (1997) 9663. 125. K. Kuczera, J. Gao, B. Tidor and M. Karplus, Proc. Natl. Acad. Sci. USA, 87 (1990) 8481. 126. R. J. Radmer and P. A. Kollman, J. Comput-Aided Mol. Design, 12 (1998) 215. 127. M. A. McCarrick and P. A. Kollman, J. Comput-Aided Mol. Design, 13 (1999) 109. 128. A. Di Nola and A. T. Branger, J. Comput. Chem., 19 (1998) 1229. 129. J. Zeng, M. Fridman, H. MarUta, H. R. Treutlein and T. Simonson, Protein Sci., 8 (1999) 50. 130. T. Fox and P. A. Kollman, PrOteins Struct. Funct. Gen., 25 (1996) 315. 131. J. A. McCammon and S. H. Northrup, Nature, 293 (1981) 316. 132. S. T. Wlodek, T. W. Clark, L. R. Scott and J. A. McCammon, J. Am. Chem. Soc., 119 (1997) 9513, 133. H.-X. Zhou, S. T. Wlodek and J. A. McCammon, Proc. Natl. Acad. Sci. USA, 95 (1998) 9280. 134. S. K. Ludemann, O. Carugo and R. C. Wade, J. Mol. Model., 3 (1997) 369. 135. G. M. Keseru, I. Kolossvary and I. Szekely, Int. J. Quantum Chem., 73 (1999) 123. 136. B. Y. Mao, Biochem. J., 288 (1992) 109. 137. Y. T. Chang, G. H. Loew, A.E. Rettie, T. A. Baillie, P. R. Sheffels and P. R. O. Demontellano, Int. J. Quantum Chem., $20 (1993) 161. 138. M. D. Paulsen and R. L. Ornstein, Protein Eng.,6 (1993) 359. 139. G. H. Peters, D. M. F. Vanaalten, A. Svensen and R. Bywater, Protein Eng. 10 (1997) 149. 140. D. C. Chatifield, A. Szabo and B. R. Brooks, J. Am. Chem. Soc., 120 (1998) 5301. 141. D. Fushman, O. Ohlenschlager and H. Rtiterjans, J. B iomol. Struct. Dynam., 11 (1994) 1377.
585
142. D. Fushman, R. Weisemann, H. Thtiring, O. Ohlenschlager and H. Raterjans, Int. J. Quantum Chem., 59 (1996) 291. 143. G. Rastelli and L. Constantino, Bioorg. Med. Chem. Lett., 8 (1998) 641. 144. D. N. Chin and G. M. Whitesides, J. Am. Chem. Soc., 117 (1995) 6153. 145. B. Manunza, S. Deiana, M. Pintore, V. Solinas and C. Gessa, J. Mol. Struct. (Teochem), 419 (1997) 33. 146. K. A. Brameld and W. A. Goddard III, J. Am. Chem. Soc., 120 (1998) 3571. 147. W. C. Guida, R. S. Bohacek and M. D. Erion, J. Comput. Chem, 13 (1992) 214. 148. C. McMartin and R. S. Bohacek, J. Comput-Aided Mol. Design, 9 (1995) 237. 149. N. R. Taylor and M. Vonitzstein, J. Comput-Aided Mol. Design, 10 (1996) 233. 150. F. Cardona, A. Goti, A. B randi, M. Scarselli, N. Niccolai and S. Mangani, J. Mol. Model., 3 (1997) 249. 151. M. Filizola, J. J. Perez, A. Palomer, and D. Mauleon, J. Mol. Graph. Model., 15 (1997) 290. 152. B. Manunza, S. Deiana, M. Pintore and C. Gessa, Soil Biol. Biochem., 31 (1999) 789. 153. G. Q. Liang and J. F. Sebastian, Bioorg. Chem., 26 (1998) 295. 154. A. Bencsura, I. Y. Enyedy and I. M. Kovach, J. Am. Chem. Soc., 118 (1996) 8531. 155. T. Lee and J. B. Jones, J. Am. Chem. Soc., 118 (1996) 502. 156. T. Lee and J. B. Jones, J. Am. Chem. Soc., 119 (1997) 10260. 157. S. K. Sreedharan, C. Verma, L. S. D. Caves, S. M. Brocklehurst, S. E. Gharbia, H. N. Shah and K. Brocklehurst, Biochem. J., 316 (1996) 777. 158. P. W. Smith, S. L. Sollis, P. D. Howes, P. C. Cherry, K. N. Cobley, H. Taylor, A. R. Whittington, J. Scicinski, R. C. Bethell, N. Taylor, T. S. Skarzynski, A. Cleasby, O. Singh, A. Wonacott, J. Varghese and P. Colman, B ioorg. Med. Chem. Lett., 6 (1996) 2931. 159. N. R. Taylor, A. Cleasby, O. Singh, T. S. Skarzynski, A. J. Wonaccot, P. W. Smith, S. L. Sollis, P. D. Howes, P. C. Cherry, R. Bethell, P Colman and J. Varghese, J. Med. Chem. 41 (1998) 798. 160. R. Kato, O. Takahashi, Y. Kiso, I. Moriguchi, and S. Hirono, Chem. Pharm. Bull., 42 (1994) 176. 161. R. E. Babine, N. Zhang, A. R. Jurgens, S. R. Schow, P. R. Desai, J. C. James and M. F. Semmelhack, Biorog. Med. Chem. Lett., 2 (1992) 541. 162. A. Villalobos, J. F. Blake, C. K. Biggers, T. W. Butler, D. S. Chapin, Y. P. L. Chen, J. L. Ives, S. B. Jones, D. R. Liston, A. A. Nagel, D. M. Nason, J. A. Nielsen, I. A. Shalaby and W. F. White, J. Med. Chem., 37 (1994) 2721.
586
163. C. Czaplewski, Z. Grzonka, M. Jaskolski, F. Kasprzykowski, M. Kozak, E. Politowska and J. Ciarkowski, Biochim. Biophys. Acta-Protein Struct. Mol. Enzymol., 1431 (1999) 290. 164. R. Rosenfeld, S. Vajda and C. Delisi, Annu. Rev. Biophys. Biomol. Struct., 24 (1995) 677. 165. R. J. Read, T. N. Hart, M. D. Cummings and S. R. Ness, Supramol. Chem. 6 (1995) 135. 166. P. Kollman, Pharm. Res., 15 (1998) 368. 167. E. Lunney, S. E. Hagen, J. M. Domagala, C. Humblet, J. Kosinski, B. D. Tait, J. S. Warmus, M. Wilson, D. Ferguson, D. Hupe, P. J. Tummino, E. T. Baldwin, T. N. Bhat, B. S. Liu and J. W. Erickson, J. Med. Chem., 37 (1994) 2664. 168. J. A. Montgomery, Farmaco, 48 (1993) 297. 169. J. A. Montgomery, S. Niwas, J. D. Rose, J. A. Secrist, Y. S. Babu, C. E. Bugg, M. D. Erion, W. C. Guida and S. E. Ealick, J. Med. Chem., 36 (1993) 55. 170. A. Caflisch, S. Fischer and M. Karplus, J. Comput. Chem., 18 (1997) 723. 171. M. C. Maurer, J. Y. Trosset, C. C. Lester, E. E. Dibella and H. A. Scheraga, Proteins Struct. Funct. Gen., 34 (1999) 29. 172. D. G. Green, K. E. Meacham, M. Surridge, F. van Hoesel and H. J. C. Berendsen, Metecc-95 Proceedings, (1995) 434. 173. A. Melo, A. T. Puga, F. Gentil, N. Brito, A. P. Alves and M. J. Ramos, J. Chem. Inf. Comput. Sci., accepted for publication. 174. A. Melo, A. T. Puga, N. Brito, A. P. Alves and M. J. Ramos, to be published. 175. B. Honig, K. Sharp and, A.-S. Yang, J. Phys. Chem., 97 (1993) 1101. 176. B. Honig and A. Nicholls, Science, 268 (1995) 1144. 177. L.-P. Lee and B. Tidor, J. Chem. Phys., 106 (1997) 8681. 178. E. Kangas and B. Tidor, J. Chem. Phys., 109 (1998) 7522. 179. F. S. Lee, Z.-T. Chu, M. B. Bolger and A. Warshel, Protein Eng., 5 (1992) 215. 180. F. S. Lee, Z.-T. Chu and A. Warshel, J. Comput. Chem., 14 (1993) 161. 181. M. E. Davis and J. A. McCammon, J. Comput. Chem., 12 (1991) 909. 182. J. D. Madura, M. E. Davis, M. K. Gilson, R. C. Wade, B. A. Luty and J. A. McCammon, Reviews in Computational Chemistry, vol. 5, K. B. Lipkowitz and D. Boyd (eds.), VCH Publishers, New York, 1994. 183. J. Novotny, R. E. Bruccoleri, M. Davis and K. A. Sharp, J. Mol. Biol., 268 (1997) 401. 184. D. Xu, S. L. Lin and R. Nussinov, J. Mol. Biol., 265 (1997) 68. 185. Z. S. Hendsch, C. V. Sindelar and B. Tidor, J. Phys. Chem., 102 (1998) 4404.
587
186. X. Barril, C. Aleman, M. Orozco and F. L. Luque, Proteins Struct. Funct. Genet., 32 (1998) 67. 187. J. Shen, J. Med. Chem., 40 (!.997) 2953. 188. K. A. Sharp, B iophys. Chem., 61 (1996) 37. 189. J. D. Madura, Y. Nakajima, R. M. Hamilton, A. Wierzbicki and A. Warshel, Struct. Chem., 7 (1996) 131. 190. T. Zhang and D. E. Koshland, Protein Sci., 5 (1996) 348. 191. M. Fujinaga, K. Huang, K. S. Bateman and M. N. G. James, J. Mol. Biol., 284 (1998) 1683. 192. P. H. Htinenberger, V. Helms, N. Narayana, S. S. Taylor and J. A. McCammon, 38 (1999) 2358. 193. W. R. Cannon, B. J. Garrison and S. J. Benkovic, J. Am. Chem. Soc., 119 (1997) 2386. 194. C. J. Gibas, S. Subramaniam, J. A. McCammon, B. C. Braden and R. J. Poljak, Biochemistry, 36 (1997) 15599. 195. J. Antosiewicz, J. A. McCammon and M. K. Gilson, Biochemistry, 35 (1996) 7819. 196. S. T. Wlodek, J. Antosiewicz, J. A. McCammon, T. P. Straatsma, M. K. Gibson, J. M. Briggs, C. Humblet and J. L. Sussman, Biopolymers, 38 (1996) 109. 197. P. Beroza and D. R. Fredkin, J. Comput. Chem., 117 (1996) 1229. 198. E. Demchuk and R. C. Wade, J. Phys. Chem., 100 (1996) 17373. 199. C. R. D. Lancaster, H. Michel, B. Honig and M. R. Gunner, Biophys. J., 70 (1996) 2469. 200. R. P. Christen, S. I. Nomikos and E. T. Smith, J. Biol. Inorg. Chem., 1 (1996) 515. 201. C. M. Soares, P. J. Martel and M. A. Carrondo, J. Biol. Inorg. Chem., 2 (1997) 714. 202. A. Kannt, C. R. D. Lancaster and H. Michel, Biophys. J., 74 (1998) 708. 203. P. J. Martel, C. M. Soares, A. M. Baptista, M. Fuxreiter, G. N~iray-Szab6, R. O. Louro and M. A. Carrondo, J. Biol. Inorg. Chem., 4 (1999) 73. 204. A. M. Baptista, P. J. Martel and C. M. Soares, Biophys. J., 76 (1999) 2978. 205. J. Antosiewicz and J. A. McCammon, Biophys. J., 69 (1995) 57. 206. J. A. McCammon, Curr. Opinion Struct. Biol., 8 (1998) 245. 207. R. C. Wade, R. R. Gabdoulline, S. K. Ltidemann and V. Lounas, Proc. Natl. Acad. Sci. USA, 95 (1998) 5942. 208. R. C. Tan, T. N. Truong, J. A. McCammon and J. L. Sussman, B iochemistry, 32 (1993) 401. 209. S. T. Wlodek, J. Antosiewicz and J. M. Briggs, J. Am. Chem. Soc., 119 (1997) 8159.
588
210. R. C. Wade, R. R. Gabdoulline and B. A. Luty, Proteins Struct. Funct. Gen., 31 (1998) 406. 211. R. C. Wade, B. A. Luty, E. Demchuk, J. D. Madura, M. E. Davis, J. M. Briggs and J. A. McCammon, Struct. Biol., 1 (1994) 65. 212. J. Antosiewicz, J. M. Briggs and J. A. McCammon, Eur. Biophys. J., 24 (1996) 137. 213. H.-X. Zhou, J. Chem. Phys., 108 (1998) 8146. 214. A. H. Elcock, M. J. Potter, D. A. Matthews, D. R. Knighton and J. A. McCammon, J. Mol. Biol., 262 (1996) 370. 215. F. Polticelli, A. Battistoni, P. O'Neil, G. Rotilio and A. Desideri, Protein Sci., 7 (1998) 2354. 216. M. E. Stroppolo, M. Sette, P. O'Neil, F. Polizio, M. T. Cambria and A. Desideri, Biochemistry, 37 (1998) 12287. 217. R. C. Wade, B iochem. Soc. Trans., 24 (1996) 254. 218. H.-X. Zhou, J. Phys. Chem. B, 101 (1997) 6642. 219. Z. Radic, P. D. Kirchhoff, D. M. Quinn, J. A. McCammon and P. Taylor, J. Biol. Chem., 272 (1997) 23265. 220. N. A. Baker and J. A. McCammon, J. Phys. Chem. B, 103 (1999) 615. 221. H.-X. Zhou, J. M. Briggs, S. Tara and J. A. McCammon, Biopolymers, 45 (1998) 355. 222. C. F. Wong and J. A. McCammon, J. Am. Chem. Soc., 108 (1986) 3830. 223. P. A. Bash, U. C. Singh, F. K. Brown, R. Langridge and P. A. Kollman, Science, 235 (1987) 574. 224. K. M. Merz, M. A. Murcko and P. A. Kollman, J. Am. Chem. Soc., 113 (1991) 4484. 225. J. J. McDonald and C. L. Brooks III, J. Am. Chem. Soc., 113 (1991) 2295. 226. J. J. McDonald and C. L. Brooks III, J. Am. Chem. Soc., 114 (1992) 2062. 227. P. R. Gerber, A. E. Mark and W. F. van Gunteren, J. Comput.-Aided Mol. Design, 7 (1993) 305. 228. P. L. Cummins and J. E. Gready, J. Comput. Chem, 15 (1994) 704. 229. P. L. Cummins and J. E. Gready, Mol. Simul., 15 (1995) 155. 230. D. M. Ferguson, R. J. Radmer and P. A. Kollman, J. Med. Chem., 34 (1991) 2654. 231. B. G. Rao and M. A. Murcko, Protein Eng., 9 (1996) 767. 232. S. W. Rick, I. A. Topol, J. W. Erickson and S. K. Burt, Protein Sci., 7 (1998) 1750. 233. G. Rastelli, B. Thomas, P. A. Kollman and D. V. Santi, J. Am. Chem. Soc., 117 (1995) 7213. 234. J. Wang, Z. Szewczuk, S.-Y. Yue, Y. Tsuda, Y. Konishi and E. O. Purisima, J. Mol Biol., 253 (1995) 473.
589
235. A. Elofsson, T. Kulinski, R. Rigler and L. Nilsson, Proteins Struct. Funct. Gen., 17 (1993) 167. 236. I. Ghosh and O. Edholm, Biophys. Chem., 50 (1994) 237. 237. A. Melo and M. J. Ramos, J. Peptide Res., 50 (1997) 382. 238. M. J. Ramos, A. Melo, E. S. Henriques, J. A. N. F. Gomes, M. Reuter, B. Maigret, W. B. Floriano and M. A. C. Nascimento, Int. J. Quantum Chem., 74 (1999) 299. 239. S. Miyamoto and P. A. Kollman, Proc. Natl. Acad. Sci. USA, 90 (1993) 8402. 240. G. Archontis, T. Simonson, D. Moras and M. Karplus, J. Mol. Biol., 275 (1998) 823. 241. R. W. Dixon and P. Kollman, Proteins Struct. Funct. Gen., 36 (1999) 471. 242. M. A. L. Erickson, J. Pitera and P. A. Kollman, J. Med. Chem., 42 (1999) 868. 243. J. G. Kirkwood, J. Chem. Phys., 3 (1935) 300. 244. D. A. Pearlman, J. Chem. Phys., 98 (1993) 8946. 245. D. A. Pearlman, J. Comput. Chem., 15 (1994) 105. 246. C. L. Brooks III, J. Phys. Chem., 90 (1986) 6680. 247. R. W. Zwanzig, J. Chem. Phys., 22 (1954)1420. 248. W. L. Jorgensen and C. Ravimohan, J. Chem. Phys., 83 (1985) 3050. 249. M. Mezei and D. L. Beveridge, Annu. N. Y. Acad. Sci., 482 (1986) 1. 250. A. J. Cross, Annu. N. Y. Acad. Sci., 482 (1986) 89. 251. M. Mezei, J. Comput. Chem., 13 (1992) 651. 252. T. C. Beutler, A. E. Mark, R. C. van Schaik, P. R. Gerber and W. F. van Gunsteren, Chem. Phys. Lett., 222 (1994) 529. 253. M. Mezei and G. Jancs6, Chem. Phys. Lett., 239 (1995) 237. 254. S. Sen and L. Nilsson, J. Comput. Chem., 20 (1999) 877. 255. S. Boresch and M. Karplus, J. Chem. Phys., 105 (1996) 5145. 256. D. A. Pearlman and P. A. Kollman, J. Chem. Phys., 94 (1991) 4532. 257. S. Boresch and M. Karplus, J. Phys. Chem. A, 103 (1999) 103. 258. S. Boresch and M. Karplus, J. Phys. Chem. A, 103 (1999)119. 259. D. A. Pearlman, J. Phys. Chem., 98 (1994) 1487. 260. C. Chipot, P. A. Kollman and D. A. Pearlman, J. Comput. Chem., 17 (1996) 1112. 261. R. J. Radmer and P. A. Kollman, J. Comput. Chem., 18 (1997), 902. 262. D. A. Kofke and P. T. Cummings, Mol. Phys., 92 (1997) 973. 263. D. L. Severance, J. W. Essex and W. L. Jorgensen, J. Comput. Chem., 16(1995) 311. 264. M. Prevost, S. J. Wodak, B. Tidor and M. Karplus, Proc. Natl. Acad. Sci. USA, 88 (1991) 10880. 265. B. Prod'hom and M. Karplus, Protein Eng., 6 (1993)585.
590
266. N. Honda, Y. Komeiji, M. Uebayasi and I. Yamato, Proteins Struct. Funct. Gen., 26 (1996) 459. 267. S. Yun-yu, A. E. Mark, W. Cun-Xin, H. Fuhua, H. J. C. Berendsen and W. F. van Gunsteren, Protein Eng., 6 (1993) 289. 268. P. E. Smith and W. F. van Gunsteren, J. Phys. Chem., 98 (1994) 13735. 269. A. E. Mark and W. F. van Gunsteren, J. Mol.Biol., 240 (1994) 167. 270. S. Boresch, G. Archontis and M. Karplus, Proteins Struct. Funct. Gen., 20 (1994) 25. 271. S. Boresch and M. Karplus, J. Mol. Biol., 254 (1995) 801. 272. G. P. Brady and K. A. Sharp, J. Mol. Biol., 254 (1995) 77. 273. G. Archontis and M. Karplus, J. Chem. Phys., 105 (1996) 11246. 274. J. Aqvist, C. Medina and J.-E. Samuelson, Protein Eng., 7 (1994) 385. 275. J. Aqvist and T. Hansson, J. Phys. Chem., 100 (1996) 9512. 276. M. D. Paulsen and R. L. Ornstein, Protein Eng., 9 (1996) 567. 277. H. A. Carlson and W. L. Jorgensen, J. Phys. Chem., 99 (1995), 10667. 278. M. L. Lamb, J. Tirado-Rives and W. L. Jorgensen, B ioorg. Med. Chem., 7 (1999) 851. 279. T. Hansson, J. Marelius and J. Aqvist, J. Comput.-Aided Mol. Design, 12 (1998) 27. 280. I. Muegge, H. Tao and A. Warshel, Protein Eng., 10 (1997) 1363. 281. I. Muegge, T. Schweins and A. Warshel, Proteins Struct. Funct. Gen., 30 (1998) 407. 282. W. L. Jorgensen, E. M. Duffy, J. W. Essex, D. L. Severance, J. F. Blake, D. K. Jones-Hertzog, M. L. Lamb and J. Tirado-Rives, Biomolecular Structure and Dynamics, G. Vergoten and T. Theophanides (Eds.), Kluwer Academic Publishers, Amsterdam, 1997. 283. D. K. Jones-Hertzog and W. L. Jorgensen, J. Med. Chem., 40 (1997) 1539. 284. T. Hansson and J. Aqvist, Protein Eng., 8 (1995) 1137. 285. J. Aqvist, J. Comput. Chem, 17 (1996) 1587. 286. J. Marelius, M. Graffner-Nordberg, T. Hansson, A. Hallberg and J./kqvist, J. Comput-Aided Mol. Design, 12 (1998) 119. 287. P. Cieplak and P. A. Kollman, J. Comput-Aided Mol. Design, 7 (1993) 291. P. Cieplak, D. A. Pearlman and P. A. Kollman, J. Chem. Phys., 101 (1994) 288. 627. 289. P. E. Smith and W. F. van Gunsteren, J. Chem. Phys., 100 (1994) 577. 290. P. Cieplak and P. A. Kollman, J. Mol. Recogn., 9 (1996) 103. 291. H. Liu, A. E. Mark and W. F. van Gunsteren, J. Phys. Chem., 100 (1996) 9485. 292. A. E. Mark, Y. Xu, H. Liu and W. F. van Gunsteren, Acta Biochim. Polonica, 42 (1995) 525.
591
293. X. Kong and C. L. Brooks III, J. Chem. Phys., 105 (1996) 2414. 294. Z. Guo and C. L. Brooks III, J. Am. Chem Soc., 120 (1998) 1920. 295. W. J. Hehre, L. Radom, P. v. R. Schleyer and J. A. Pople, Ab Initio Molecular Orbital Theory, Wiley & Sons, Inc., New York (1986) 296. T. Ziegler, Chem. Rev., 91 (1991) 651 297. W. Thiel, Adv. Chem. Phys., 93 (1996) 703 298. I. Pettersson and T. Liljefors, in Rev. Comput. Chem., K. B. Lipkowitz and D. B. Boyd Eds., VCH Publishers, Inc., New York (1996) 9, 16 299. A. Warshel and M. Levitt, J. Mol. Biol., 103 (1976) 227 300. U. C. Singh and P. A. Kollman, J. Comput. Chem., 7 (1986) 718 301. R. V. Stanton, M. Per~ikyl~i, D. Bakowies and P. A. Kollman, J. Am. Chem. Soc., 120 (1998) 3448 302. P. A. Bash, M. J. Field and M. Karplus, J. Am. Chem. Soc., 109 (1987) 8092 303. F. Bernardi, M. Olivucci and M. A. Robb, J. Am. Chem. Soc., 114 (1992) 1606 304. J. Aqvist and A. Warshel, Chem. Rev., 93 (1993) 2523 305. T. M. Glennon and A. Warshel, J. Am. Chem. Soc., 120 (1998) 10234 306. J. Bentzien, R. P. Muller, J. Florifin and A. Warshel, J. Phys. Chem B, 102 (1998) 2293 307. R. V. Stanton, D. S. Hartsough and K. M. Merz, Jr., J. Comput. Chem., 16 (1995) 113 308. P. D. Lyne, M. Hodoscek and M. Karplus, J. Phys. Chem. A, 103 (1999) 3462 309. R. P. Muller, T. Wesolowski and A. Warshel, in Density Functional Methods: Applications in Chemistry and Materials Science, M. Springbourg ed., John Wiley & Sons, Ltd., (1996) 310. M. J. Field, P. A. Bash and M. Karplus, J. Comput. Chem., 11 (1990) 700 311. V. V. Vasilyev, A. A. B liznyuk and A. A. Voityuk, Int. J. Quantum Chem., 44 (1992) 897 312. V. Thdry, D. Rinaldi, J.-L. Rivail, B. Maigret and G. Ferenczy, J. Comput. Chem., 15 (1994) 269 313. M. A. Thompson, E. D. Glendening and D. Feller, J. Phys. Chem., 98 (1994) 10465 314. M. A. Thompson and G. K. Schenter, J. Phys. Chem., 99 (1995) 6374 315. D. Bakowies and W. Thiel, J. Phys. Chem., 100 (1996) 10580 316. J. Gao, Acc. Chem. Res., 29 (1996) 298 317. G. Monard, M. Loos, V. Thery, K. Baka and J.-L. Rivail, Int. J. Quantum Chem., 58 (1996) 153 318. M. Hartmann, K. M. Merz, R. Vaneldik and T. Clark, J. Mol. Modeling, 4 (1998) 355
592
319. S. Antonczak, G. Monard, M. F. Ruiz-Ldpez and J.-L. Rivail, J. Am. Chem. Soc., 120 (1998) 8825 320. S. L. Mayo, B. D. Olafson and W. A. Goddard, III, J. Phys. Chem., 94 (1990) 8897 321. D. Rinaldi and J.-L- Rivail, Theor. Chim Acta, 32 (1973) 32, 57 322. O. Tapia and O. Goscinski, Mol. Phys., 29 (1975) 1653 323. R. Montagani and J. Tomasi, Int. J. Quantum Chem., 39 (1991) 851 324. for a review on continuum methods: C. J. Cramer and D. G. Truhlar, in Rev. Comput. Chem., K. B. Lipkowitz and D. B. Boyd Eds., VCH Publishers, Inc., New York (1995) 6, 1 325. L. Onsager, J. Am. Chem. Soc., 58 (1936) 1486 326. J. P. Bowen and N. L. Allinger, in Rev. Comput. Chem., K. B. Lipkowitz and D. B. Boyd Eds., VCH Publishers, Inc., New York (1991) 2, 81 327. J. R. Maple, M.-J. Hwang, T. P. Stockfisch, U. Dinur, M. Waldman, C. S. Ewig and A. T. Hagler, J. Comput. Chem., 15 (1994) 162; M.-J. Hwang, T. P. Stockfisch and A. T. Hagler, J. Am. Chem Soc., 116 (1994) 2515 328. U. Dinur and A. T. Hagler, in Rev. Comput. Chem., K. B. Lipkowitz and D. B. Boyd Eds., VCH Publishers, Inc., New York (1991) 2, 81 329. Molecular Simulations, Inc., 9685 Scranton Road, San Diego, CA 921212777, U.S.A. 330. Y.-J. Zheng and R. L. Ornstein, J. Am. Chem. Soc., 118 (1996) 11237 331. X. Barril, C. Alemfin, M. Orozco and F. J. Luque, Proteins: Structure, Function and Genetics, 32 (1998) 67 332. A. Melo and M. J. Ramos, Chem. Phys. Letters, 245 (1995) 498; A.Melo, M.J.Ramos, W.B.Floriano, J.A.N.F.Gomes, J.F.R.Lefio, A.L.Magalhfies, B.Maigret, M.C.Nascimento e N. Reuter, J. Mol Structure (Teochem), 463, 81 (1999). 333. A. Melo and M. J. Ramos, Int. J. Quantum Chem., 72 (1999) 157 334. M. J. Ramos, J. Mol. Graphics, 9 (1991) 91 335. S. Alvarez Santos, A. Gonzalez Lafont and J. M. Lluch, Can. J. Chem., 76 (1998) 1027 336. M. J. S. Dewar, E. G. Zoebisch, E. A. Healy and J. J. P. Stewart, J. Am. Chem. Soc., 107 (1985) 3902 337. P. E. M. Siegbahn, J. Am. Chem. Soc., 120 (1998) 8417 338. C. W. Bauschlicher, Jr., A. Ricca, H. Partridge and S. R. Langhoff in Recent Advances in Density Functional Methods, D. P. Chong, Ed., World Scientific Publishing Company, Singapore (1997), Part II, 165 339. M. J. Frisch, G. W. Trucks, H. B. Schlegel, P. M. W. Gill, B. G. Johnson, M. A. Robb, J. R. Cheeseman, T. A. Keith, G. A. Petersson, J. A. Montgomery, K. Raghavachari, M. A. A1-Laham, Y. G. Zakrzewski, J. V. Ortiz, J. B. Foresman, J. Cioslowski, B. B. Stefanov, A. Nanayakkara, M.
593
340. 341. 342. 343. 344. 345. 346. 347. 348. 349. 350. 351. 352. 353. 354. 355. 356. 357. 358. 359.
360. 361. 362.
Challacombe, C. Y. Peng, P. Y. Ayala, W. Chen, M. W. Wong, J. L. Andres, E. S. Replogle, R. Gomberts, R. L. Martin, D. J. Fox, J. S. B inkley, D. J. Defrees, J. Baker, J. P. Stewart, M. Head-Gordon, C. Gonzalez and J. A. Pople, Gaussian 94, Gaussian Inc.: Pittsburgh, PA (1995) F. Himo and L. A. Eriksson, J. Am. Chem. Soc., 120 (1998) 11449 C. V. Parast, K. K. Wong, S. A. Lewisch, J. W. Kozarich, J. Peisach and R. S. Magliozzo, Biochemistry, 34 (1995) 2393 I. A. Topol, J. R. Casasfinet, R. Gussio, S. K. Burt and J. W. Erickson, J. Mol. Structure (Theochem) 423 (1998) 13 T. Lind, P. E. M. Siegbahn and R. H. Crabtree, J. Phys. Chem. B, 103 (1999) 1193 A. J. Mulholland and W. G. Richards, J. Phys. Chem. B, 102 (1998) 6635 A. J. Mulholland and W. G. Richards, J. Mol. Structure (Theochem), 429 (1998) 13 A. J. Mulholland and W. G. Richards, J. Mol. Structure (Theochem), 427 (1998) 175 A. Warshel and R. M. Weiss, J. Am. Chem. Soc., 102 (1980) 6218 J.-K. Hwang, G. King, S. Creighton and A. Warshel, J. Am. Chem Soc., 110 (1988) 5297 A. Warshel, Computer Modeling of Chemical Reactions in Enzymes and Solutions, John Wiley & Sons, Inc., New York (1991) P. A. Kollman and K. M. Merz, Jr., Acc. Chem. Res., 23 (1990) 246 P. A. Bash, U. C. Singh, R. Langridge and P. A. Kollman, Science, 236 (1987) 564 V. Luzhkov and A. Warshel, J. Comput. Chem., 13 (1992) 199 J. Gao, J. Phys. Chem., 96 (1992) 537 J. Gao and X. Xia, Science, 258 (1992) 631 G. G. Ferenczy, J.-L. Rivail, P. R. Surjan and G. Naray-Szabo, J. Comput. Chem., 13 (1992) 830 H. Liu, F. Mtiller-Plathe and W. F. van Gunsteren, J. Chem. Phys., 102 (1995) 1722 A. Warshel and R. M. Weiss, J. Am. Chem. Soc., 102 (1980) 6218 J.-K. Hwang, G. King, S. Creighton and A. Warshel, J. Am. Chem Soc., 110 (1988) 5297 M. A. Cunningham and P. A. Bash, in Computer Simulation of Biomolecular Systems- Theoretical and Experimental Applications, W. F. Vangunsteren ed., Ed. Dordrecht: Kluwer Academic Publ, 3 (1997) 177 J. Chandrasekhar, S. F. Smith and W. L. Jorgensen, J. Am. Chem. Soc., 107 (1985) 155 S. Miertus, E. Scrocco and J. Tomasi, Chem. Phys., 55 (1981) 117 J. Tomasi and M. Persico, Chem. Rev., 94 (1994) 2027
594
363. A. A. Rashin, M. A. Bukatin, J. Andzelm and A. T. Hagler, B iophys. Chem., 51 (1994) 375 364. J. Shen, F. A. Quiocho, J. Comput. Chem., 16 (1995) 445 365. R. M. Jackson, J. E. Sternberg, J. Mol. Biol., 250 (1995) 258 366. D. Harris and G. Loew, J. Comput. Chem., 17 (1996) 273 367. M. T. Cances, V. Mennucci and J. Tomasi, J. Chem. Phys., 107 (1997) 3032 368. M. Cossi, V. Barone, B. Mennucci and J. Tomasi, Phys. Letters, 286 (1998) 253 369. A. L. Magalhfies, S. R. R. S. Madail and M. J. Ramos, to be published. 370. R. P. Muller, J. Florifin and A. Warshel, in B iomolecular Structure and Dynamics, G. Vergoten and T. Theophanides eds., Kluwer Academic Publishers, NATO-ASI, 47 (1997) 371. M. Fuxreiter and A. Warshel, J. Am. Chem. Soc., 120 (1998) 183 372. for a review on the subject: J. Gao, in Rev. Comput. Chem., K. B. Lipkowitz and D. B. Boyd Eds., VCH Publishers, Inc., New York (1996) 7, 119 373. D. Rinaldi, J.-L. Rivail and N. Rguini, J. Comput. Chem., 13 (1992) 675 374. S. W. Rick, S. J. Stuart and B. J. Berne, J. Chem Phys., 101 (1994) 6141 375. M. A. Thompson, J. Phys. Chem., 100 (1996) 14492 376. F. J. Luque and M. Orozco, J. Comput. Chem., 19 (1998) 866 377. I. H. Hillier, J. Mol. Structure (Theochem), 463 (1999) 45 378. J. Gao, J. Comput. Chem., 18 (1997) 1061 379. T. Hansson, P. Nordlund and J. Aqvist, J. Mol. Biol., 265 (1997) 118 380. H. J. Kim and J. T. Hynes, J. Am Chem. Soc., 114 (1992) 10508 381. A. J. Mulholland, G. H. Grant and W. G. Richards, Protein Engng., 6 (1993) 133 382. K. Kolmodin, P. Nordlund and J. *qvist, Proteins-Structure Function and Genetics, 36 (1999) 370 383. P. D. Lyne, A. J. Mulholland and W. G. Richards, J. Am. Chem. Soc., 117 (1995) 11345 384. 384 A. J. Mulholland and W. G. Richards, Proteins-Structure Function and Genetics, 27 (1997) 9 385. M. J. Harrison, N. A. Burton and I. H. Hillier, J. Am. Chem. Soc., 119 (1997) 12285 386. M. Per~ikyl~i and P. A. Kollman, J. Am. Chem. Soc., 119 (1997) 1189 387. J. Florifin and A. Warshel, J. Phys. Chem. B, 102 (1998) 719 388. Y. Pan and M. A. McAllister, J. Am. Chem. Soc., 120 (1998) 166 389. C. I. Bayly, P. Cieplak, W. D. Cornell and P. A. Kollman, J. Phys. Chem., 97 (1993) 10269
595
390. A. J. Mulholland and W. G. Richards, in Transition State Modeling for Catalysis, D. G. Truhlar and K. Morokuma eds., ACS Symposium 721 (1999) 448 391. G. J. Tawa, I. A. Topol, S. K. Burt and J. W. Erickson, J. Am. Chem. Soc., 120 (1999) 8856 392. H. Liu, F. Mtiller-Plathe and W. F. van Gunsteren, J. Mol. Biol., 261 (1996) 454 393. D. C. Chatfield, K. P. Eurenius, B. R. Brooks, J. Mol. Structure (Theochem), 423 (1999) 79 394. J. Mavri, Int. J. Quantum Chem., 69 (1998) 753 395. MNDO94, Unichem, version 3.0, available from Cray Research Inc., Eagan, MN.
This Page Intentionally Left Blank
L.A. Eriksson (Editor)
Theoretical Biochemistry- Processes and Properties of Biological Systems
597
Theoretical and Computational Chemistry, Vol. 9 9 2001 Elsevier Science B.V. All rights reserved
Chapter 14
The QM/MM Approach to Enzymatic Reactions Adrian J. Mulholland School of Chemistry, University of Bristol, Bristol BS8 1TS, UK QM/MM methods (which combine a quantum mechanical (QM) treatment of the electronic structure of a small active site region with a simpler 'molecular mechanics' representation of its environment) show considerable promi,;e for modelling and simulation of enzyme-catalysed reaction mechanisms. Tlley can provide biochemically useful information and valuable insight into enzyme reactions, as recent applications have shown. There is growing interest in the development and application of QM/MM methods in this important area. QM/MM calculations can be carried out at semiempirical or ab initio m4)lecular orbital, or density-functional theory, QM levels. This chapter reviews QM/MM methods, and in particular their application to studies of enzyme mechadsms. Basic features of the methods are described, along with aspects of their lesting and validation, and some recent developments. Some methods for mod~,.lling reactions using the QM/MM approach are briefly described. Practical considerations in the study and simulation of enzymic reactions are outl!ned. Finally a number of recent applications of QM/MM methods to enzymecatalysed reaction mechanisms are reviewed.
1. INTRODUCTION Enzymes - biological catalysts - mediate most biochemical reactions, nlaking a molecular level understanding of their activity a problem of significant ~.nd widespread importance [ 1,2]. From a fundamental point of view, it is oi great interest to examine how evolution has arrived at these highly efficient sc,lutions to chemical problems. Improved understanding will also lead to many l:,ractical applications. The outstanding catalytic properties of enzymes have long been a source of fascination and envy for chemists. Enzymes are generally hig]lly specific for the reactions they catalyse, for their substrates, and for the stereochemistry of reaction. They are also typically impressively proficient catalysts. For example, the final step of pyrimidine biosynthesis involves decarboxylation of orotidine 5'-monophosphate catalysed by the enzyme
598
orotidine 5'-monophosphate decarboxylase. The enzyme-catalysed reaction is more than 1017 times faster than its uncatalysed counterpart [3]. Similarly, comparison of the first-order rate constant for uncatalysed glycoside hydrolysis with kcat for [3-amylase indicates that the enzyme accelerates the rate of hydrolysis by a factor of more than 10 iv [4]. These impressive catalytic properties of enzymes, combined with their biodegradability, low toxicity, and ability to function in generally mild and often aqueous conditions, are attractive for practical catalytic applications in industry, biotechnology, chemical analysis and organic synthesis. Through understanding the principles underlying these properties, practical catalysts may be developed either by modification of natural enzymes or through the development of biomimetic compounds which embody these principles. Better understanding of enzyme mechanisms will also contribute to the design of inhibitors (many drugs are enzyme inhibitors). As pointed out by Pauling [5], compounds which resemble the transition state for an enzyme-catalysed reaction should bind with high affinity, making them ideal inhibitors. In the design of inhibitors as potential pharmaceutical lead compounds, mechanistic knowledge can be invaluable. Understanding the roles of individual amino acids in an enzyme mechanism is vital to understanding the effects of mutations, and therefore will greatly assist in the interpretation of genetic data, for example in relation to disease and drug metabolism. Another major challenge is analysing the regulation of enzyme activity in metabolic processes. Structural and genetic data on enzymes are being generated in large quantities, and the need for techniques to assist in the analysis of these data, to relate structure and function, is pressing. An essential primary step in studying an enzyme reaction is to establish the chemical mechanism, which has proved generally difficult from experimental analysis alone. Any reaction intermediates must be identified, and the precise roles of catalytic residues must be determined. The transition state plays a central part in theories of catalysis, and so the nature of the transition state for a particular process should be established. Further, the contributions of specific interactions and other factors to stabilization of transition states and intermediates [6,7], and destabilization of reactants [8], must be analysed. Many enzymes undergo large conformational changes as part of their reaction cycles [ 1], and their function and relationship to the chemical changes occurring in the reaction should be explored. Furthermore, it is now clear that proteins are complex dynamic entities, exhibiting a wide range of internal motions, of which some have functional significance [9]. The effects of protein dynamics on the reaction should therefore also be considered. Quantum effects such as hydron tunnelling may also be important in some enzyme reactions [ 10,11 ] and so a complete description will require these to be taken into account. Also, to understand fully the origins of an enzyme's catalytic power, the enzymic
599
reaction should be compared with the equivalent reaction in solution [ 12]. Clearly, it is necessary to be certain of the details of the reaction in the enzyme before a proper comparison can be made. Overall, this presents a considerable theoretical and experimental challenge. Enzymes have been studied extensively by a wide range of experimental techniques [ 1]. However, for many enzymes it has proved difficult to differentiate between alternative mechanisms and to analyse contributions to catalysis. It is extremely difficult generally to study directly unstable species such as transition states and reaction intermediates in enzymes. Structures solved by X-ray crystallography and NMR spectroscopy have transformed understanding of enzyme catalysis, and detailed kinetic studies have defined many features of their reaction schemes, but it has been difficult to relate kinetic and structural data. Site-directed mutagenesis can help to identify residues involved in catalysis and binding, but the results can be hard to interpret or misleading, as the mutation may induce subtle changes in structure [13], or changes in mechanism [ 14], which complicate the analysis. An example of the difficulty of establishing mechanistic details is provided by lysozyme. Hen eggwhite lysozyme was the first enzyme structure to be solved by Phillips and coworkers [ 15], who proposed a mechanism based on the structure of an inhibitor complex and organic chemical intuition. This work, over thirty years ago, provided a tremendous impetus to the whole field [ 1]. Subsequent intensive experimental work by many groups has succeeded in confirming many details of the mechanism and providing significant additional ins ights, but even now some aspects remain unresolved [16].
1.1. Simulation approaches Computer simulation and modelling methods have the potential to make a significant contribution to studies of enzyme mechanisms, complementing experimental investigations [17,18]. Taking experimental data, particularly structural information, as a starting point, calculations can be used to address questions which are difficult to resolve by experiment alone. For example, the structures of transition states and reaction intermediates can be modelled, as can the interactions responsible for stabilizing these species in the enzyme [7,19]. Ultimately, it should be possible to calculate the energy profile for the reaction in the enzyme, and to dissect energetic contributions to lowering the activation energy. Simulations should provide an atomic level description of the reaction, including the dynamics of the system. Given the complexity of these problems, it will of course be essential to test theoretical methods against experimental data. The information provide by simulations can then give unique insight into the question of how enzymes 'work'.
600
Enzymes are generally large molecules (generally protein or nucleic acid macromolecules), and this presents a major challenge for simulation techniques. For studies of the dynamics and conformational changes of proteins and other biological macromolecules, a number of empirical 'molecular mechanics' (MM) potential functions have been developed and successfully applied [20-22]. Typically, these functions represent atoms as point partial charges with van der Waals radii. Electronic polarization is not represented in most current potential functions, that is to say the partial charges are invariant. Bonds and bond angles are represented by simple harmonic terms, with simple periodic terms for torsion angles, and terms for other intramolecular interactions where necessary. These simple functional forms mean that energy and force evaluations can be performed rapidly, and the calculations are computationally efficient, allowing simulations of small proteins in solution on the nanosecond timescale. However, potential functions of this type cannot be applied to model the bondbreaking and bond-making, and electronic reorganization, of a chemical reaction. For example, the bond terms do not allow bond dissociation or formation, and electronic redistribution cannot be accounted for. Also, the inherent parameters are developed based on the properties of stable molecules, and so are likely not to be applicable to transition states and intermediates. One approach to modelling reactions is to develop parameters specifically for reactions, and this has indeed been highly successful in application to organic reactions in solution [23]. However, the parameters are generally applicable only to a particular reaction, or small class of reactions, meaning that reparameterization will be necessary for each problem studied. Also, the form of the potential function can impose limitations, such as the neglect of electronic polarization. For studies of small molecules and their reactions, electronic structure calculations have proved extremely useful. Ab initio molecular orbital calculations [24], and increasingly density-functional theory based methods [25], can be used to optimize the structures of stable molecules and transition states, and calculate potential surfaces and reaction pathways. Rates of reaction can then be calculated by a variety of methods [26]. Systems in solution can be modelled using continuum solvation models, such as reaction field methods (e.g. the polarizable continuum model [27]). However, ab initio and densityfunctional theory calculations at levels approaching chemical accuracy are extremely demanding computationally, and the computer requirements increase significantly as the size of the system increases, due to their scaling properties. For practical studies of reactions, such calculations are presently limited to aperiodic systems of the order of tens of atoms, though developments in methodologies such as the 'divide and conquer' approach promise to expand this range considerably [28]. Semiempirical molecular orbital methods, such as
601
those based on the Modified Neglect of Diatomic Differential Overlap (MNDO [29]) approximation (for example AM1 [30] and PM3 [31]), are considerably less computer intensive. Consequently they can be applied to much larger systems (of the order of hundreds of atoms). Divide and conquer [32,33], and localized orbital [34], techniques have been developed which allow semiempirical electronic structure calculations on whole proteins. However, solvated enzyme complexes can contain perhaps tens of thousands of atoms, currently out of reach even of semiempirical methods (equally important for modelling a reaction is the requirement for extensive geometry optimization, potential energy surface exploration, conformational sampling or dynamics simulation, significant challenges for large molecules even with cheap quantum chemical methods). Enzymes have well defined three-dimensional structures, containing many polar groups, and are clearly not homogeneous, making continuum reaction field models inappropriate. The environment of the enzyme (typically aqueous solution, but some enzymes operate in concentrated solutions, in membranes or in protein or nucleic acid complexes, for example) must be considered. The sheer size of enzymes is therefore a major challenge for methods aiming to simulate enzymic reactions. It is also very important to bear in mind that to model a reaction, it will be necessary at least to optimize the geometry of the system, to locate minima and transition state structures. The development of methods for such optimizations in large systems presents difficulties in itself. Also, the potential energy surfaces for protein internal motions are highly complex, showing many minima, so that a multitude of conformational substates can be expected to contribute at normal temperatures and a single structure will not necessarily be truly representative [9]. Similarly, as mentioned above, the dynamics of the enzyme may be important. For simulations of dynamic effects to be feasible, (or indeed for conformational sampling and calculations of free energy differences) a simulation method must be capable of calculating trajectories of at least many picoseconds. One approach to modelling enzyme reactions is through calculations on clusters of small molecules representing important functional groups (for example amino acid side chains involved in catalysis or binding, substrate, etc.) with their positions typically taken from a representative X-ray crystal structure of an enzyme complex [7,17]. For example, acetate may be chosen to represent an aspartate side chain, imidazole to represent histidine, and so on). It may then be possible to optimize the geometries of complexes representing the reactants, transition state, intermediates and products of steps in the reaction. Such calculations can be used to examine the nature of interactions between groups at the active site, and can provide useful models of transition states and intermediates. They can also assist greatly in testing the accuracy of different levels of calculations for a given application [ 19] (e.g. comparing the results of
602
semiempirical with ab initio molecular orbital calculations, or different levels of ab initio treatment, for a particular reaction [35,36]). However, it is probable that the influence of the surrounding enzyme (and solvent) on the reaction is significant and should be represented if the chemical behaviour is to be modelled correctly. Important functional groups may not be included in a small model. An important practical consideration [ 19] is that it may be extremely difficult to optimize the geometry of the model (for example to locate a transition state structure): in the absence of the bonded and nonbonded restraints of the enzyme complex, it is likely that the molecules representing the functional groups may drift and interact in ways which would be impossible in the enzyme. On the other hand, imposing constraints on the model may prevent successful optimization, particularly as the initial structure is likely to differ somewhat from the actual reactive complex. Ideally, a method for the simulation of enzyme reactions should capture the essential details of the chemical reaction (fundamentally quantum mechanical in nature) while treating the whole system. Many groups have worked on the development of such techniques, which are in fact far too numerous to cover here. Some have been reviewed previously [ 17]. One notable method is the empirical valence bond (EVB) model of Warshel and coworkers [37]. Warshel, ,~qvist and others have applied EVB simulations to study reactions in a large number of enzymes and in solution. In the EVB technique, a number of resonance structures are chosen to represent the reaction. The energies of these resonance forms are given by simple empirical force fields. The overall EVB Hamiltonian is calibrated to reproduce experimental data for the reaction in solution. Other workers have parameterized valence bond models using ab initio results [38]. The free energy of activation for the reaction in solution, and in the enzyme, can be calculated using the EVB model and free energy perturbation simulations. These approaches have been reviewed extensively elsewhere [ 12,17,37,39,40]. A most promising approach to the simulation of enzyme-catalysed reactions is through methods which combine a quantum chemical description of the groups directly involved in the reaction, to calculate the electronic structure of the reacting system, with a simpler molecular mechanics treatment of the enzyme and the environment [17-19,41 ]. These are described as quantum mechanical/molecular mechanical methods, QM/MM for short. Interest in QM/MM methods has grown rapidly in recent years, and it is now clear that they can provide biochemically useful and relevant insight into the mechanisms of enzymic reactions [18,19,42,43]. These methods, and their application, are the focus of this chapter.
603
2. THEORY 2.1. Background The past decade has seen tremendous growth in the development and application of QM/MM methods in many areas. Reviewing the field of computer modelling of enzyme-catalysed reaction mechanisms in the early part of the 1990s [ 17], the number of studies applying QM/MM methods was very limited. Warshel and Levitt's study on lysozyme (1976) [44] was seminal in the field. Singh and Kollman [45] developed a method for ab initio molecular orbital QM/MM calculations, which was applied by Waszkowycz, Hillier and coworkers to study amide and ester hydrolysis by phospholipase A2 [46,47]. Computational limitations of the time however generally placed severe restrictions on QM/MM applications at the ab initio level. Field, Bash and Karplus [41] developed a QM/MM method at the semiempirical molecular orbital level (AM 1 or MNDO) which was used to study the mechanism of triosephosphate isomerase [48] in its first application to an enzyme. Semiempirical QM/MM calculations, with their considerably lower computational requirements, held the promise of more extensive studies. In general, though, only very few enzymes had at that stage been studied by QM/MM methods. The great potential of QM/MM calculations had been recognized, and in recent years it has begun to be realized. Indeed the number of studies published since the earlier review is now so large that it is not possible to cover them all here. Among the enzymes which have been studied at various QM/MM levels are para-hydroxybenzoate hydroxylase [49,50], citrate synthase [6,18,19,51], aldose reductase [52,53], HIV protease [54,55], carbonic anhydrase [56,57], lactate dehydrogenase [58,60,167], malate dehydrogenase [61], papain [62,63], tyrosine phosphatase [64-66], enolase [67], aspartate transcarbamylase [68,69], galactose oxidase and a synthetic biomimetic compound [70], neuraminidase [71,72], nickel-iron hydrogenase [73], glutathione reductase [74], ribulose 1,5-bisphosphate carboxylase [59] chorismate mutase [8], subtilisin [75], acetylcholinesterase [76], phospholipase A2 [77], TEM1 [3-1actamase [78], thymidine phosphorylase [65], cAMP dependent protein kinase [79], alcohol dehydrogenase [80], thermolysin [81 ], orotidine monophosphate decarboxylase [82], triosephosphate isomerase [83,84] and dihydrofolate reductase [34,85,86]. It should be stressed that these various studies applied a variety of techniques to examine many different aspects of these enzymes. The QM/MM methods provide a means of calculating the energy of the system and can be thought of as potential functions. Other techniques must be used in tandem to model the reaction, and the development of techniques for modelling reactions in large systems is a
604
complementary area of intensive research. Progress has been aided by methodological development by a number of groups, and of course facilitated by ongoing growth in affordable computer power. QM/MM implementations at semiempirical and ab initio molecular orbital, and density-functional theory, QM levels have been published. A variety of different approaches for partitioning the system into QM and MM regions, and interaction schemes, have been developed and investigated. The desire to simulate enzyme reaction mechanisms has been a major motivating factor in these developments, but QM/MM techniques are also finding increasingly widespread application in other areas requiring a quantum mechanical treatment of part of a large system. These applications include reactions in other condensed phases (for example, in DNA [87], in solution [88-90], in solids such as zeolites [74,91 ], at surfaces [92], in clusters and in transition metal complexes [70,93], investigations of solvation and solvent effects [90,94-98] and electronic excitations in large molecules [99,100] and in solution [101,102], and calculations of absorption energies in zeolites [ 103,104], as well as studies of biological binding interactions [63,105-107]. QM/MM methods will undoubtedly come to play a central role in many areas of computational chemistry and biochemistry.
Figure 1. The essence of the QM/MM approach is that a small region (e.g. at an enzyme active site) is treated by a quantum chemical method, while the bulk is represented more simply by molecular mechanics. It is often necessary also to apply a boundary term because of the finite size of the simulation system. The basis of the QM/MM approach (Figure 1) is that the process or subsystem of most interest is localized in a fairly small part of a larger system. The computational effort is therefore focused on this small region, which most requires a quantum mechanical description. The bulk of the system is treated more simply by a molecular mechanics potential function. The combination of
605
the efficiency and speed of the MM force field with the versatility and range of applicability of the QM method allows reactions in large systems to be studied. An analogy of this partitioning philosophy can perhaps be drawn with approximations such as the Htickel approach for conjugated hydrocarbons in which the rt electrons are considered separately from the o-bonded framework of the molecule, which is treated as having a purely structural function. Similarly, hybrid techniques were developed for conjugated molecules in which the framework was treated by molecular mechanics while the energies of the rt electron system (of ground and excited states, for example) were calculated by a quantum mechanical procedure [41,108]. It is fair to say that the QM/MM field was pioneered by Warshel and Levitt's work on lysozyme [44]. They introduced a method to treat a small reacting system by semiempirical molecular orbital calculations, at the same time treating the protein and solvent by empirical energy functions, including interactions between the two systems, so allowing quantum chemical modelling of a reaction in a system as large as an enzyme. Other approaches have been taken to including environmental effects in quantum chemical calculations, which have become increasingly sophisticated (for example the effective fragment potential method of Krauss and coworkers, which has been applied to model enzyme reactions [109,110]), and there is a clear common ground between these and QM/MM methods. Hybrid methods combining different levels of QM calculation for different parts of a system have also been developed [111]. However, an essential aspect of QM/MM methods, as defined here, is that they should contain a molecular mechanics component. This MM component allows the energy of the bulk system to be calculated rapidly, its structure to be optimized, and so on, at low computational cost. Bakowies and Thiel [ 112] have put forward a useful classification of QM/MM methods according to the type of QM/MM coupling employed: A: Type A models are the simplest QM/MM models, using a simple mechanical embedding scheme. The QM/MM interactions are treated exactly as the same interactions would be in a purely classical MM calculation. Polarization of the QM atoms by the MM environment is therefore not included. B" Type B models allow polarization of the QM system by the MM system, including the charges of the MM groups in the QM calculation. The electronic calculation includes the effects of the MM system.
606
C" Type C models go beyond type B by including polarization of the MM region also (for example through a dipole interaction model). D" Type D models represent the most complex level of QM/MM coupling, including self-consistent polarization of the MM region through an iterative procedure. In the simplest (Type A) models, the energy of the whole system is given by subtracting the MM energy of the QM system from the MM energy of the whole system, and adding the QM energy of the QM system. The gradient and higher derivatives can be defined similarly. This method is well defined, and straightforward to apply. Multiple levels of theory can be combined (e.g. a high level correlated ab initio method for a small region, an intermediate level of ab initio theory for a somewhat large region and a semiempirical or molecular mechanics treatment of the whole system). The IMOMM, IMOMO and generalized ONIOM approaches successfully developed by Morokuma and coworkers [111,113,114] and others [115] function on this basis (though extensions beyond this level of treatment are being developed within this formalism). The methods have proved very useful in treatments of, for example, transition metal compounds and fullerenes [ 111,113-115]. However, enzymes are highly polar molecules, containing many polar and charged groups. For modelling enzyme reactions, the lack of polarization of the QM system by the MM environment is potentially a serious drawback. The influence of the MM system on the QM calculation should preferably be included. Most studies of enzyme reactions to date have employed QM/MM models of type B, and consequently it is methods of this type we will concentrate on here. Extensions to type C and beyond are challenging because of the extra computational effort required for calculation of polarization of the MM system, and also because the MM potential functions which have been developed for biological macromolecules (such as for AMBER [21], CHARMM [20,116] and GROMOS [22]) do not allow for changes in charge distribution. Rather, polarization is included in an average way: for example, in the MM parameterization process in CHARMM, ab initio gas-phase interaction energies are used as target data, but are increased by a factor of 1.16 to represent strengthening of polar interactions through polarization in solution (and proteins) [20]. Achieving consistency between the polarization of the MM system and the QM description used is therefore also a challenge. QM/MM methods which include polarization of the MM system have been developed for smaller molecules. Polarizable force fields for biological molecules are the subject of much current research, and will be an important part in the direction of future QM/MM research [102,117-119]. QM/MM calculations can assist in this development
607
process, in assessing polarization effects for small (QM) regions in large biomolecules [120].
2.2. Basic theory The theoretical bases of QM/MM methods have been covered in detail by many groups [41,90,112,121,122], and so will only briefly be outlined here. The QM/MM partitioning is simplest when the border between the QM and MM regions can be considered not to separate covalently bonded atoms and will be considered initially. Following Field et al. [41 ], the effective Hamiltonian for the whole QM/MM system can be considered as being made up of various terms: ISleff -- /-tQM +/--IMM +/QQM/MM +/-)Boundary
(1)
respectively the Hamiltonian of the pure QM system, of the MM system, the interaction between the QM and MM systems and finally any boundary terms applied to the simulation system to represent the effects of the bulk surroundings. The total energy of the system is given by: I?t eff W ( r, Rc~ , R M ) - E ( Ra , R M ) q? ( r, Rc~ , R M )
(2)
is the electronic wavefunction of the QM system which is a function of the coordinates, r, of the electrons and also depends on the coordinates of the nuclei in the quantum system, R~, and of the atoms in the MM region, RM. From the definition of the effective Hamiltonian in equation (1), the total energy of the system is:
e "- EQM -F EMM "k-EQMfMM + Eboundary
(3)
The energy of the MM system, EMM, and the energy of the QM system, EQM, are calculated as in standard calculations at those respective levels. The MM energy will be defined by the potential function to be used, usually consisting of terms representing bond stretching, bond angle bending, dihedral (and 'improper' dihedral) angles, electrostatic interactions (atoms typically represented as atom-centred invariant point charges) and van der Waals interactions [9,20-22,123]. The boundary energy arises (as in MM simulations) because the simulation system necessarily includes only a finite number of atoms, and so terms to reproduce the effects of the bulk must be introduced.
608
For example, periodic boundary conditions can be applied [ 124] (although special consideration must be given to the interaction of the QM group with its images), or where only part of a protein can be included, the stochastic boundary method for dynamics can be applied [9,125]. The QM/MM interaction energy, EOM/MMis found by application of the QM/MM Hamiltonian, which typically consists of terms due to electrostatic interactions and van der Waals interactions. In an ab initio QM/MM calculation, the MM atomic point charges are generally included directly through one-electron integrals, and the interaction of the classical charges with the nuclei of the QM system. The van der Waals QM/MM interactions on the other hand are usually calculated by a molecular mechanics procedure (e.g. through Lennard-Jones terms), exactly as the corresponding interactions would be calculated between MM atoms not interacting through bonding terms. It is therefore necessary to assign MM van der Waals parameters to each QM atom. The van der Waals terms represent dispersion and exchange-repulsion interactions between QM and MM systems, and differentiate MM atom types in their interaction with the QM system. This is particularly important in differentiating between atoms of the same charge (e.g. halogen ions) which could otherwise be indistinguishable to the QM system, and for MM atoms with charges close to zero. The van der Waals terms are important at close range, and play an important part determining interaction energies and geometries. The (MM) parameters for the QM atoms can be optimized to reproduce experimental or high level ab initio results for small complexes [126,127]. The simple point charge model has obvious limitations. An example of the failure of the classical point charge model in modelling a reaction is provided by Wei and Salahub [89], who found that solvent effects on the barrier to proton transfer in water were incorrectly treated when surrounding water molecules were represented in this way. One improvement to the simple point charge model is the incorporation of polarization of the MM atoms by the QM system (see above). This can be achieved simply by assigning an isotropic dipole polarizability to each MM atom, giving rise to an induced dipole on each MM atom that is proportional to the electric field at that point [ 118]. Polarization of MM atoms, particularly close to the QM system, is likely to be important, and should ideally be included (QM/MM models of type C above). Such a procedure has the disadvantage of adding an extra layer of complexity (and therefore increasing the computational demands) through the requirement for the induced dipoles to be calculated self-consistently. Also, current MM potential functions for proteins do not allow for changes in charge distribution. Another possible improvement is to allow for charge transfer between the QM and MM regions [128].
609
Other improvements to the point charge representation for MM groups in QM/MM calculation are being investigated. One approach is to replace the point charges by Gaussian distributions, effectively 'smearing out' or 'blurring' the MM charges [56,129]. A multipolar representation of MM groups (e.g. derived from a distributed multipole analysis) would allow a more sophisticated description of their electrostatic potentials than a simple atom-centred point charge model. Polarization and exchange-repulsion terms could also be included, as in the effective fragment potential approach for QM calculations [109,110]. In semiempirical molecular orbital methods such as MNDO [29], AM1 [30] and PM3 [31], electrostatic interactions are treated somewhat differently than in ab initio methods. There is no unique definition of the electrostatic potential in these semiempirical methods, and several expressions for it have been derived [96,112,130,131 ]. Only valence electrons are treated explicitly in the MNDOtype methods, while the inner electrons of an atom are combined with the nucleus in an invariant positive core. The repulsion of two positively charged cores A and B of charges ZA and Z8 separated by a distance RA8 is not treated by a simple Coulombic expression because this would predict repulsion between neutral atoms and molecules when in fact van der Waals attraction is observed at separations greater than bond distances. To avoid this excessive repulsion, the core-core terms are treated using the two-electron repulsion integrals. The monopole charge distribution of an s orbital is used to represent the core. However, this function does not reproduce the large repulsion between atoms at very short distances. Two additional functions are therefore used in MNDO to make the core-core repulsion approach the classical value. A slightly different form is used for the interaction of oxygen or nitrogen with hydrogen in an attempt to allow for hydrogen bonding interactions. However, it was found that MNDO dealt very badly with hydrogen bonded systems. This is one of the major weaknesses of MNDO [ 132]. In AM 1 and PM3, extra Gaussian terms are added to reduce repulsions at about van der Waals separations. In PM3 two Gaussian functions are added for each atom, whereas in AM 1 two to four Gaussians per atom are used [132]. These Gaussians may therefore be considered in part as representing van der Waals or dispersion attraction effects. This is an entirely empirical correction. For the interaction of an electron on atom A with the core of another atom B, the core of B is also represented by an s orbital distribution, and the interaction is evaluated using the expression for the two electron integral. In the Field et al. AM1/CHARMM semiempirical QM/MM method [41 ], QM/MM electrostatic interactions are calculated by including the point charges of the MM atoms as atomic cores, which represent the nucleus and inner electrons of an atom combined in the semiempirical methods. The MM region atoms are treated as though they were atomic cores in
610
MNDO or AM 1. Therefore one-electron core-electron integrals must be calculated to give the effect of the MM charges on the electrons of the QM system (and included in the SCF calculation), and core-core terms give the interaction energy between the positive cores of the QM atoms and the MM atom charges (which may be positively or negatively charged). This core-core energy is included in the final calculation of the energy of the QM system (parameterized as the heat of formation). There is a slight discrepancy in using the AM 1 or MNDO heat of formation of the QM system within a QM/MM calculation because the molecular mechanical energy is a potential energy rather than an enthalpy, but this approximation is likely to make only a small (1 kcal/mol) difference to the total [41 ]. The form of the core-core interactions used by Field et al. [41] is the same as that used in MNDO [29]. The extra Gaussian terms used in AM 1 to moderate the core-core repulsion at approximately van der Waals separations were found not to be necessary for the MM atoms. This is in accord with the interpretation of these terms as empirical corrections for dispersion attraction; such interactions between QM and MM atoms are represented by the classical Lennard-Jones term in the MM potential. Two QM parameters are therefore required for any MM atom M: the Gaussian exponent CZMused in the MNDO core-core interaction expression, for the calculation of the energy of interaction with the QM cores; and P0, representing a monopole distribution on the MM atom, used in the core-core term and also to evaluate the core-electron interaction integrals. After some experimentation, values of aM =5 and P0 = 0.0 were used for all MM atoms [41 ]. Thus the MM atoms are differentiated in their interactions with QM atoms only by differences in their partial charges and van der Waals parameters.
2.3. QM/MM partitioning schemes The QM/MM treatments described above are sufficient for treating small molecules in solution. In many enzymes, however, some amino acid side chains participate directly in the reaction, undergoing chemical change as part of the mechanism, and must therefore be included in the QM region. Similarly other side chains play binding roles, and a MM representation may be inadequate. Similarly, it may be more practical to treat only the reactive parts of large cofactors or substrates by quantum chemical methods. In most enzymes, therefore, there is a need to be able to partition covalently bonded molecules into QM and MM regions. Simply deleting the MM atoms which are not to be included in the QM calculation is out of the question, as this would result in unpaired electrons and an incorrect description of the electronic state. This problem can be avoided by simple subtraction in schemes such as the ONIOM approach [100,111,133] in which the QM and MM regions do not interact. For
611
QM/MM calculations in which the two regions interact, the unsatisfied valence resulting from partitioning a covalent bond can be satisfied by one of two general approaches (Figure 2): either an extra atom can be introduced into the QM system (a 'link' atom, often a hydrogen atom is used), or a fixed ('frozen') orbital containing a single can be used at the boundary. Warshel and Levitt introduced the latter type of approach, using a single sp 2 orbital for atoms at the QM/MM junction [44].
MM region
Q
\
\
\
N"'"'" C~
/
H131 ~ 2
QM region
Cy
Figure 2. Schematic representation of the partitioning of an aspartate amino acid residue into QM and MM regions. The side chain (which may be involved in e.g. chemical reaction or binding) is treated QM (up to and including C[3). To account for the bond between Ca and C~, a 'link' atom (typically hydrogen) may be added toC[3. Alternatively, a localized hybrid orbital on C~ can be employed. Consideration must also be given as to which MM interactions to include between the MM and QM atoms, e.g. which bonded terms (for bonds, bond angles, dihedral angles, etc.) between the two regions. In the implementation of Field et al. [41 ], all MM bonded terms involving at least one MM atom are retained. The local self-consistent field (LSCF) or fragment SCF method has been developed for treating large systems [105,134-139], in which the bonds at the QM/MM junction ('frontier bonds') are described by strictly localized bond orbitals. These frozen localized bond orbitals are taken from calculations on small models, and remain unchanged in the QM/MM calculation. The LSCF method has been applied at the semiempirical level [134-137], and some developments for ab initio calculations have been made [139]. Gao et al. have developed a similar Generalized Hybrid Orbital method for semiempirical QM/MM calculations, in which the semiempirical parameters of atoms at the junction are modified to enhance the transferability of the localized bond orbitals [ 140]. Recent developments for ab initio QM/MM calculations include the method of Phillip and Friesner [141 ], who use Boys-localized orbitals in ab initio Hartree-Fock QM/MM calculations. These orbitals are again taken from calculations on small model systems, and kept frozen in QM/MM calculations.
612
In density-functional theory calculations, it is possible to freeze the electron density associated with an atom at the junction [ 142]. 'Divide and conquer' methods for QM calculations also suggest possibilities for QM/MM partitioning [143]. Techniques such as these have the advantage over link atom methods that no artificial extra atoms (and their interactions) need to be added to the system. Possible drawbacks include a lack of flexibility in the electronic description, due to the freezing of the frontier orbital, particularly in response to chemical changes. It may be difficult to derive transferable orbitals from model systems which accurately represent the bonding in larger molecules. As with all QM/MM calculations, it is desirable that the junction between the QM and MM region should lie as far as possible from the area of major electronic reorganization. The link atom (or 'dummy junction atom') approach was used first for QM/MM calculations by Singh and Kollman [45], was also used by Field et al. [41 ], and has been widely used in QM/MM calculations, largely due to its simplicity. The approach is similar to QM calculations on small models of enzyme active sites, in which C-C bonds 'broken' in excising the functional groups are replaced by C-H bonds [7,17]. Hydrogen atoms are used most frequently as link atoms, but other atoms types have also been used (such as pseudohalogens, as in the Hyperchem program package [ 144]). When a reasonable choice of QM/MM boundary can be made, such as cases when it lies across a carbon-carbon single bond distant from chemical changes and also from highly charged MM atoms, the link atom method can give good results. Reuter et al. have found the LSCF and link atom approaches to perform with similar accuracy in semiempirical QM/MM calculations [145]. Lyne et al. found that the use of link atoms in complexes of biological ligands with metal cations gave good results in ab initio and density-functional QM/MM calculations [121 ]. The approach is simple and often effective. However, the additional atom(s) introduce extra degrees of freedom, complicating the definition of the potential energy surface [146-148]. It is possible that by including certain interactions of the link atom, these interactions may be 'overcounted', i.e. represented by both QM and MM [149]. This can lead to problems with conformational properties. In the implementation of Field et al. [41 ], all MM bonding terms (bonds, angles and dihedral energies) involving QM atoms were retained where such a term involved at least one MM atom. The link atom did not interact with the MM atoms, either electrostatically or through van der Waals terms. The treatment of the interactions of link atoms has been explored by Reuter et al. [145] and Antes and Thiel [147,150]. It has been found to be inadvisable to exclude link atoms from interactions with the MM charges. In fact, it is preferable for all QM atoms to interact with the same set of MM atoms, which has implications also for the definition of any
613
nonbonded cutoff in a QM/MM simulation. In the ab initio QM/MM method of Eurenius, et al. [149] and Lyne et al. [121], the QM atoms do not 'feel' the effects of the link atom host group (MM). This group should be electrically neutral, avoiding the introduction of unbalanced charge. B. Brooks has suggested a double link atom model, with classical and QM link atoms, which can avoid problems of introducing dipoles. Vasilyev [76] allowed interaction between the link atom and MM atoms, and zeroed the adjacent MM charge. Mulholland and coworkers have set to zero the charges of MM groups adjacent to QM regions in semiempirical [51] and ab initio QM/MM calculations [6]. This approach means that the electrostatic interactions of the QM and MM regions are comparable. Burton, Hillier et al. have explored different schemes for treatment of link atoms, including neutralization of the MM atom at the junction, and the effects of 1-4 scaling [65]~ In MM force fields, electrostatic (and van der Waals) interactions between bonded atoms, and between atoms separated by two bonds, are typically not included (being represented by MM bonded terms), while 1-4 interactions may be reduced by a scaling factor. Excluding or scaling QM/MM electrostatic interactions in this way may lead to poor results (though it is usually desirable to avoid spuriously large electrostatic QM/MM interactions with neighbouring groups), but the question of consistency with the MM potential function may arise. Other recently developed methods which overcome some difficulties of the link atom approach include the pseudobond method of Yang and coworkers [146], in which a one-free-valence atom with an effective core potential is used at the QM/MM boundary, and forms a 'pseudobond' with the the QM system proper. The effective core potential is optimized so that the pseudobond has the same length and strength as the real bond, and other properties are maintained. The same core potential can be applied in Hartree-Fock and density-functional calculations, and is designed to be independent of the choice of MM force field. Studies of the effects of QM/MM electrostatic interaction schemes will be required. For semiempirical QM/MM calculations, Antes and Thiel have developed 'connection atoms', which reproduce the structural and electronic properties of methyl groups [147]. They discuss QM/MM partitioning for schemes of types A, B and C. R6thlisberger et al. have developed pseudopotentials (e.g. for atomic cores) to represent accurately the properties of QM/MM partitioned molecules in density-functional Car-Parrinello molecular dynamics simulations using plane wave basis sets [56]. Eichinger et al. have developed the scaled position link atom method (SPLAM) to overcome some of the difficulties of other link atom approaches [ 148].
614
3. QM/MM METHODS A number of QM/MM implementations have been published in recent years, and aspects of their performance described. Many of these QM/MM methods are available to researchers as academic or commercial software packages, while a number of groups have developed their own codes. The number of published QM/MM methods is now quite large, so that only a small number can be mentioned here. The QM/MM method of Field et al. was implemented in the widely-used macromolecular simulation package CHARMM [116], and has subsequently been the subject of more development [51,55]. Hodoscek, Brooks and coworkers coupled CHARMM with the quantum chemistry package GAMESS-US [151 ] for ab initio molecular orbital QM/MM calculations. The performance of this method has been tested by Eurenius et al. [ 149], and by Lyne et al. [ 121 ]. These latter workers also coupled CHARMM with the CADPAC program [ 152] for density-functional QM/MM calculations, and compared the results of Hartree-Fock ab initio and density-functional QM/MM treatments of the same systems. GAMESS-UK has also been combined with CHARMM [116] for QM/MM calculations [129]. The original QUEST ab initio QM/MM program of Singh and Kollman [45] was developed through a combination of the AMBER simulation package with an early version of the Gaussian program (Gaussian-80). Burton, Hillier and coworkers have combined Gaussian-94 [153] and AMBER [21] for ab initio QM/MM calculations using a variety of coupling schemes [65,74]. The HyperChem modelling package can perform semiempirical QM/MM calculations [ 144]. Parrinello and coworkers [56,70,148] have developed a scheme for CarParrinello molecular dynamics simulations with a QM/MM method, with the CPMD and EGO programs (EGO is based on the CHARMM force field for proteins). Field, Amara et al. have developed the DYNAMO program for semiempirical QM/MM calculations [72], using the OPLS-AA force field [154]. Stanton, Hartsough and Merz combined the density-functional program deMon with the AMBER MM force field, the first such coupling [ 122]. These workers have also studied an ab initio Hartree-Fock/MM coupled potential, combining Gaussian-92 and AMBER 4.0 [155]. Wei and Salahub have also modified deMon for QM/MM molecular dynamics simulations with SPC and TIP3P MM [156] water models [89]. Hartsough and Merz have used a combination of the PM3 semiempirical method with AMBER to simulate the dynamics of the active site of carbonic anhydrase [57]. Liu and Shi [157] developed an interface between the molecular mechanics/dynamics package GROMOS [22] and the semiempirical program MOPAC [ 132], which has been applied by Liu, van Gunsteren and coworkers to study enzyme mechanisms [54]. Tu and Laaksonen have combined GAMESS-US and Gaussian94 with MCMOLDYN [158].
615
Phillip and Friesner have discussed the development of an ab initio QM/MM method within the Jaguar program [141 ]. Ryde has presented the COMQUM program [80]. Thiel and coworkers have discussed combinations of MNDOtype methods with the MM3 force field [112,130,147,159]. Several other packages are available [ 102,103].
3.1. Method development and testing QM/MM methods contain a significant empirical component, and so it is vital that they are tested to ensure that they represent the processes and systems of interest with sufficient accuracy. Indeed, the development of a QM/MM method is a process requiring considerable effort and background work. The MM parameters (such as atomic charges and Lennard-Jones parameters) should be tested and optimized [126,127]. The performance of the QM/MM partitioning and interaction schemes should also be investigated. Typical of the tests carried out in the development of a QM/MM method are those of Field et al. [41]: these involved calculations on interactions of small molecules (e.g. formate with water, formaldehyde with water, formamide with methanol, etc.), with one molecule QM and one MM, and comparison of binding energies and geometries with fully ab initio calculations. The classical TIP3P model [156] was used for (MM) water. It was found that the modification made for this model in CHARMM [20], of adding van der Waals radii to the water hydrogen atoms produced an incorrect geometry for the chloride/water complex, and so the unmodified TIP3P model [156] was preferred [41]. To test the link atom approach for partitioning individual molecules into QM and MM regions, rotational barriers, deprotonation energies, proton affinities, ionization potentials and dipole moments were calculated for some small molecules, and compared to experimental, MM and QM results. In general, the QM/MM results were found to be satisfactory. Limited accuracy in some cases was attributed to the semiempirical QM methods (AM1 and MNDO), and in others to the QM/MM method. Eurenius et al. have studied QM/MM partitioning in ab initio QM/MM calculations, in particular the treatment of internal energy terms and QM/MM electrostatic interactions within partitioned molecules [ 149]. They found that careless partitioning, for example resulting in a net MM charge in a neutral molecule, could produce very poor results. Lyne et al. tested ab initio molecular orbital and density-functional QM/MM models of the water dimer, and a water molecule interacting with Ca 2+, Mg 2+ or C1- ions [ 121]. To test the effects of link atoms, models of biological ligands interacting with divalent cations were compared to pure quantum calculations on the same complexes. The QM/MM methods were found to perform well for the structures, binding energies and charge distributions. Phillip and Fiesner have stressed the importance of testing conformational properties, and tested their ab
616
initio Hartree-Fock QM/MM implementation on alanine dipeptide and tetrapeptide [ 141 ]. These authors warn that the process of parameter development for QM/MM methods may be at least as intensive as for purely MM methods. Tu and Laaksonen [ 158] have investigated the effects of Lennard-Jones parameters in ab initio QM/MM molecular dynamics simulations Of liquid water (one water molecule treated QM at the RHF/6-311G(d,p) level). They found a strong dependence of the results (e.g. for the structure of water, based on radial distribution functions, and polarization of the QM water molecule) on the Lennard-Jones parameters used for the QM atoms. Using the same (flexible) TIP3P MM water model as Field et al. [41 ], it was found that the coupling between the QM water and the MM waters was too strong (first peaks in the radial distribution functions too close and too high compared to experimental results), despite the induced dipole moment being calculated correctly [ 158]. The use of Lennard-Jones parameters optimized based on QM/MM calculations on the water dimer, however, did not improve the simulation results, and gave too weak polarization of the QM water molecule. These results demonstrate the difficulties which can arise in developing QM/MM parameters to reproduce a range of properties, particularly in dynamical simulations. Methodological improvements, for example improved MM representations including polarization, may be required. Stanton et al. calculated density-functional QM/MM free energies of solvation for a number of ions using a molecular dynamics-free energy perturbation method [ 122]. They found that the solvation energy of most ions studied was overestimated, except for chloride, which was underestimated. Gao and coworkers have worked extensively on the application and development of QM/MM methods, particularly for the investigation of solvation effects . Gao used Monte Carlo calculations to calculate the absolute solvation energy of chloride ion (describe by AM1, with the OPLS Lennard-Jones term) in TIP3P water [ 160]. Gao and Alhambra have extended calculations of this type to incorporate the Ewald lattice-sum method for long-range electrostatic interactions [ 124]. Gao and Freindorf have developed an ab initio HartreeFock/MM method [127], and applied it in Monte Carlo simulations in solution [ 161 ]. For simulations of N-methylacetamide [ 161 ] in solution, the LennardJones term for nitrogen was modified to improve the QM/MM description of hydrogen bonding with water. Gao has also discussed a QM/MM method using a polarizable MM solvent model [119]. Clark, Lanig et al. have described the testing and validation of their semiempirical QM/MM implementation, by comparison of calculated and experimental absorption energies of small organic molecules in various zeolites
617
[ 103]. This validation indicates that the method is also appropriate for systems which may be less well defined experimentally, such as enzyme complexes. Bash, Mackerell, Ho and coworkers have described a procedure for the development of QM/MM parameters for chemically accurate simulations [126, 162]. The parameters of a semiempirical QM method (in this case AM1) are optimized to as to reproduce as closely as possible the experimental heats of formation, dipole moments and MP2/6-31G(d) geometries for the molecules of interest (typically small molecules containing the same functional groups as those involved in an enzyme reaction). These system-specific parameters can be derived through the use of a genetic algorithm. This procedure should provide an accurate QM model, which is at the same time computationally efficient. Next, the interactions of the functional groups with water molecules are calibrated. The QM/MM interaction energy in bimolecular complexes of a small molecule treated QM (by AM1) with a water molecule (TIP3P) in different relative orientations is fit to RHF/6-31G(d) values for the same complexes (interaction energies scaled by 1.16 for neutral molecules [20]). This is achieved through adjustment of the van der Waals parameters of the QM atoms. Geometries and interaction energies for these QM/MM complexes are compared with purely RHF/6-31G(d) results (the RHF/6-31G(d) method provides a reasonable description of hydrogen bonded complexes [7]). A similar procedure has been used in the development of MM parameters for CHARMM [20]. Finally, free energies of reaction in solution (e.g. of proton or hydride transfer between the small molecules) are calculated by QM/MM free energy perturbation calculations [ 126] using simple geometric reaction coordinates (constrained using SHAKE [163]). The reliability of the free energy calculations can be tested by independent forward and backward simulations of the same reaction. These free energy changes for reaction can then be compared with experimental results to test the accuracy of the QM/MM system. QM/MM parameters optimized through this procedure [126,162] for methanol, methoxide, imidazole, imidazolium, formaldehyde, nicotinamide and 1,4-dihydroxynicotinamide have been applied in QM/MM studies of the reaction mechanism of malate dehydrogenase [61] (see section 6). Reuter et al. have compared the link atom and the local self-consistent field (LSCF) methods for QM/MM partitioning of molecules in detail [ 145]. They calculated, at the semiempirical QM level, deprotonation energies and proton affinities of propanol and a tripeptide, and also tested QM/MM geometry optimizations of ethane and butane. Both the link atom and LSCF partitioning schemes were found to perform reasonably well, given a good choice of the partitioning into QM and MM regions. Recommendations regarding the treatment of the QM/MM frontier were provided. Antes and Thiel [ 150] studied different schemes for the treatment of link atoms in ab initio, density-functional
618
and semiempirical QM/MM calculations of proton affinities for a series of alcohols. They found semiempirical QM/MM results to be less sensitive to the details of the link atom treatment than density-functional or ab initio results. Burton et al. compared a mechanical embedding scheme for electrostatic interactions (i.e. calculating these interactions purely classically, MO+MM [114]) with QM/MM schemes in which the QM region is influenced by the surrounding point charges [65]. Small peptides were studied. For the embedding scheme, analysis of the electrostatic energies indicated that problems could be caused by scaling of 1-4 interactions which is commonly used in MM force fields [65].
4. TECHNIQUES FOR REACTION MODELLING 4.1. Optimization of transition structures and reaction pathways A useful QM/MM method should be able to calculate the energy of a system undergoing chemical change in an enzyme to a reasonable level of accuracy, but this represents only part of the requirement for modelling such a reaction. Techniques for optimizing the structures of key species in the reaction are needed, and beyond this, methods for conformational sampling, simulating molecular dynamics and calculating activation energies. Location of minimum energy structures is reasonably straightforward with modern algorithms [9,123,164], given a means of calculating the forces in a system, but optimization of transition state structures, reaction pathways and vibrational frequencies, and characterization of stationary points, is more demanding. Algorithms developed for small molecules are not suitable for such large systems with many degrees of freedom: direct calculation, storage and manipulation of Hessian matrices becomes extremely difficult. Fixing the position of most of the atoms in the system is a drastic limitation in a flexible system such as a protein. Many groups have been active in the development of geometry optimization algorithms applicable to large molecules [113,130,164166]. One such development is incorporated in the software package GRACE [58,167], designed specifically for QM/MM calculations on enzyme reactions. A good discussion of the nature of the problems, of the GRACE program, and some alternative approaches, is provided by Turner, Moliner and Williams [ 167]. In the approach used in the GRACE package, the system is divided into two subsets, the 'environment' and the 'core'. These may or may not correspond to the MM and QM regions in a QM/MM calculation, but the core will contain the QM atoms, as this is the region where the chemical changes occur. Partial rational function optimization is used to search for a transition structure in the core degrees of freedom. The environment is optimized to a minimum before each step, and so is described as the fast-cycling subset. The
619
advantage of this divided approach is that a Hessian matrix is required for the core region only. This method has been used to optimize transition state structures for lactate dehydrogenase [167]. For six different starting structures, taken from a classical molecular dynamics trajectory, six transition state structures were optimized. Some features of these structures were fairly constant, in particular regarding the nature of the chemical change, but the positions of active site residues, and the energies of the structures, differed by large amounts. This is a demonstration of the need to treat structural variability in modelling an enzyme reaction. It is indicative of a significant general problem, namely that protein potential energy surfaces have many minima of similar energy, separated by small barriers [9]. A (reaction) path between two conformations may therefore pass through several minima and saddle points, and that for a chemical reaction may involve many related structures. One possibility for studying a reaction path in a protein is to use one of a number of 'non-local' methods, which treat the pathway as a whole, with the aim of finding all the minima and barriers along it [168]. Methods of this type have been applied to conformational changes in a number of proteins. For example, a variety of techniques have been developed by Elber and coworkers to study reaction paths in proteins and other large systems [ 169-172]. One such approach is to minimize the average energy of a series of points along an approximate interpolated path between the initial and starting structures [ 169, 170]. This path energy minimization method has been further developed by Smart for the location of transition structures [168,173]. Another method is the conjugate peak refinement algorithm which has been developed to locate transition structures (saddle points) and reaction paths in proteins [174]. This method (available in the TRAVEL module of CHARMM) has been applied to biological macromolecules, for example in modelling the mechanism of rotamase catalysis by FK506 binding protein (using a tailored MM potential function) [ 18,175]. These techniques have the advantage over NewtonRaphson type TS searches for large systems that only the first derivatives of the energy are required. They should also determine the entire reaction pathway, possibly traversing several energy barriers and minima. Such methods will probably be increasingly important in QM/MM studies of enzyme reactions. When the entire pathway is treated, however, the computational expense increases proportionately. A basic means of modelling approximate reaction paths is the 'adiabatic mapping' or 'coordinate driving' approach [123,149]. The energy of the system is calculated by minimizing the energy at a series of fixed (or restrained, e.g. by harmonic forces) values of a reaction coordinate, which may be the distance between two atoms, for example. More extensive and complex combinations of geometrical variables can be chosen. This approach is only valid if one
620
conformation of the protein can represent the state of the system at a particular value of the reaction coordinate. If several different conformations allow that value to be reached, then all of them should be included in a full description. If only one conformation of the protein appears to be involved in the reaction, a single minimum energy structure of this conformation may adequately represent the several closely related structures making up this conformational state. Minimizing the QM/MM potential energy of such a representative conformation along the reaction coordinate should therefore provide a reasonable approximation to the enthalpic component of the potential of mean force (the free energy profile) for the reaction [123]. Adiabatic mapping will overestimate energy barriers if atom movements connected with the reaction are not included in the reaction coordinate, because minimization alone will not allow the fluctuations necessary to relieve steric strain to occur [9]. Discontinuities can result, making it difficult to identify the location, as well as the height, of the barrier. Paths calculated in forward and reverse directions may differ significantly. It is absolutely necessary that the structure is well minimized before the reaction coordinate calculations are carried out. If the structure is not fully minimized, the minimization performed at steps along the reaction coordinate will be dominated by relaxation of atoms in areas other than the active site and energy differences calculated along the path will be masked by a downward shift in energy throughout. Despite the limitations and drawbacks of the simple adiabatic mapping (coordinate driving) approach, it has been applied in many QM/MM applications to date. It has the advantage that is simple to apply, and does not require intensive calculations such as second derivative evaluations, or simultaneous treatment of several points on a pathway. It can be useful for quick initial scans of potential energy surfaces, and for generating approximate models of transition states and intermediates, in which some allowance is made for structural relaxation to chemical changes at the active site. It is suitable only for reactions involving small chemical and structural changes, involving a small number of groups, but may be useful in such cases: approximate QM/MM barriers for an enzyme reaction calculated in this way have been shown to correlate with experimental activation energies [49]. A good approach is to base a reaction coordinate for a QM/MM calculation on transition state and stable structures (and preferably the minimum energy reaction path such as the intrinsic reaction coordinate) calculated for representative small model systems [7,8,51 ]. Important structural features of the reaction should be similar in the gas-phase model and in the enzyme.
621
4.2. Activation free energies, conformational behaviour and dynamics Many enzymes undergo significant conformational changes during their catalytic cycles, and these may be among the slowest steps in the overall reaction [1]. Simulating conformational changes is particularly difficult if large structural changes are involved and occur over a long timescale (e.g. changes driven by substrate binding). Conformational changes on a smaller scale (of perhaps a small number of amino acid side chains) may occur during catalysis by some enzymes, coupled to chemical changes. Although these may be more amenable to simulation because of their size, fluctuations of this sort are usually not apparent from crystallographic structures, and so it is difficult to assess the nature of the change to be studied. A further problem that must be addressed for problems of this kind is how to study the coupling of changes occurring in one part of a protein with structural changes elsewhere. Allosteric effectors are one example of structural changes being propagated over large distances in a protein [ 1]. Minimization methods relieve local strain efficiently, but are not good at transmitting effects to other parts of the structure [ 123]. To simulate changes of this kind a dynamic treatment would probably be necessary, or alternatively a means of forcing the system between two known structures [ 174] could be used. In addition to the need to simulate conformational changes associated with reaction, the fact that proteins exhibit significant structural variability (i.e. within a given 'conformation'), as outlined above, must be taken into account. Molecular dynamics simulations using MM potential functions may be useful for generating a range of representative structures (e.g. of an enzyme-substrate complex) for subsequent QM/MM calculations on the reaction. MM dynamics simulations are useful because they can access significantly longer timescales (e.g. in the nanosecond range [20-22,125]) than is feasible in QM/MM simulations, allowing much better sampling. Ab initio QM/MM calculations are currently too computationally demanding for dynamics simulations of reasonable length for realistic models of most enzyme reactions [19]. Molecular dynamics simulations of enzyme reactions have been performed successfully with semiempirical QM/MM methods [54,57,64,72] (see section 6). The sampling provided by such QM/MM molecular dynamics simulations may be used to calculate activation free energies (and to address dynamical effects on the reaction). Thus, semiempirical QM/MM simulations have an important role to play. It has been suggested that a mapping procedure can be used to calculate ab initio QM/MM reaction free energies from empirical valence bond simulations [39,176]. This approach shows promise, but calculation of energies within the QM system (as opposed to its interaction with its surroundings) from such a simulation remains problematic.
622
The quantity of most interest in the analysis of an enzyme reaction is the activation free energy [2]. Calculation of activation free energies, and free energy differences more generally, in complex systems such as proteins, is a formidable challenge. Many approaches are being developed to aspects of this problem. If a good reaction coordinate can be defined, the free energy profile for the reaction along this coordinate (potential of mean force) can be calculated by umbrella sampling or free energy perturbation techniques [9]. The definition of an appropriate reaction coordinate incorporating all the important degrees of freedom may be difficult. For umbrella sampling calculations, it is not clear how to calculate suitable potentials to restrain the reaction coordinate. Methods which treat entire reaction pathways show great promise for the calculation of activation free energies in condensed systems [ 177]. The transition state theory rate constant [26] can be found by calculating the equilibrium constant between the reactants and the TS (from the probability of the system being in the vicinity of the TS along the reaction coordinate) and the equilibrium rate of barrier crossing. Calculations of the potential of mean force in solution, using reaction paths from IRC calculations in the gas phase, have proved able to reproduce experimentally measured solvent effects for several types of reaction, and given valuable insight into the origin of these effects [23]. However, the potential of mean force is found by equilibrating the surrounding solvent with the reactive system at a series of fixed values along the reaction coordinate. In reality the dynamics of the surrounding solvent affect the reaction and so treating them separately is an approximation which may produce misleading results. Solvent dynamics have been found to play an important role in, for example, SN2 reactions in solution for which a successful reaction requires motion of the solvent and its adoption of a configuration that allows the reaction to proceed [178,179]. Warshel advocates the use of a valence bond mapping potential involving the energy difference between EVB structures, rather than a geometric reaction coordinate, to include the effects of solvent properly [39]. In transition state theory, dynamic effects are included approximately by including a transmission coefficient in the rate expression [9]. This lowers the rate from its ideal maximum TS theory value, and should account for barrier recrossing by trajectories that reach the TS (activated complex) region but do not successfully cross to products (as all trajectories reaching this point are assumed to do in TS theory). The transmission coefficient can be calculated by activated molecular dynamics techniques, in which molecular dynamics trajectories are started from close to the TS and their progress monitored to find the velocity at which the barrier is crossed and the proportion that go on to react successfully [9,26,180]. It is not possible to study activated processes by standard molecular dynamics because barrier crossing events occur so rarely. One reason for the
623
failure of a trajectory to reach the products for a reaction in solution is that the structure of the surrounding solvent may prevent completion of the reaction. The interiors of proteins are more densely packed than liquids [181 ], and so the participation of the atoms of the protein surrounding the reactive system in an enzyme-catalysed reaction is likely to be at least as important as for a reaction in solution. There is experimental evidence which indicates that protein dynamics may modulate barriers to reaction in enzymes [ 10,11 ]. Ultimately, therefore, the effects of the dynamics of the bulk protein and solvent should be included in calculations on enzyme-catalysed reactions. Dynamic effects in enzyme reactions have been studied in empirical valence bond simulations: Neria and Karplus [ 180] calculated a transmission coefficient of 0.4 for proton transfer in triosephosphate isomerase, a value fairly close to unity, and representing a small dynamical correction. Warshel has argued, based on EVB simulations of reactions in enzymes and in solution, that dynamical effects are similar in both, and therefore that they do not contribute to catalysis [39]. These workers also calculate quantum dynamical effects in an enzyme reaction to be small, in EVB simulations employing path integral techniques [ 182]. Warshel finds enzymic catalysis in general to be due to a reduced activation free energy, because of the lower reorganization energy in the enzyme than in solution [ 12,37]. However, the apparent small size and similarity of dynamical corrections in solution and in enzymes for some reactions studied to date, does not guarantee that they can always be safely ignored in the calculation of an individual rate constant, for example. Activated dynamics calculations have been carried out for rotations of the aromatic ring side chain of a tyrosine residue (Tyr35) in bovine pancreatic trypsin inhibitor (BPTI) [183,184]. The rates of rotation calculated by these techniques are not in good agreement with NMR results for this residue, however, and this is probably due at least partly to the difficulty of defining a reaction coordinate which encompasses the movement of the large number of atoms involved in the process [9]. An accurate activated dynamics calculation requires a reaction coordinate which describes all the pertinent features of the reaction. In a protein, a reaction path may involve coupled movement of many atoms, and will therefore be hard to describe by a single one-dimensional function. The reaction coordinate will almost certainly have to include several internal coordinates if it is to be realistic. For example, the rotation of Tyr-35 in B PTI mentioned above is accompanied by disruption of the surrounding protein (this is required to lower steric repulsion), and so the reaction coordinate used had to incorporate not only the side chain dihedral angle for the rotation of the ring itself, but also an angle describing the orientation of the ring with respect to the nearby protein backbone. Analysis of activated dynamics trajectories started from the TS along this reaction coordinate showed that the rotation is
624
preceded by movement of the backbone to a structure in which the isomerization can occur freely; the rotation is said to be 'gated' by this backbone movement [185]. In cases such as this, for which steric interactions with residues not involved directly in the activated process make a significant contribution, it may be essential to include other reaction coordinates (i.e. to examine the free energy surface in these several dimensions rather than the profile along a single reaction coordinate [186]). It has been suggested that a reduced set of a few internal coordinates can be used to determine an accurate one-dimensional reaction coordinate for a conformational change of this type in a protein [ 187]. The energy of the rest of the system is minimized at every point defined by the fixed values of the few essential coordinates to create an adiabatic surface (it is assumed that the coordinates not used to define the surface are equilibrated at every point with the coordinates describing the change, and do not participate in the change, and that one conformation of the whole system adequately represents a point on the surface). The advantage of this method is that the TS on this surface (which is a function of only a small number of variables, the essential internal coordinates) may be found efficiently using methods analogous to those used for small molecules. A saddle point on this reduced surface should correspond to a TS for the whole system. The TS may then be used to determine a reaction path on the adiabatic surface, which may be used in free energy calculations. The rotation of a tosyl ring bound to an active site residue of o~-chymotrypsin has been studied in this way [ 188]. Warshel has however argued that the reaction coordinate should not be defined in terms of geometric degrees of freedom of the reacting system. Instead, a mapping potential based on valence bond energies of e.g. reactant and product states, is preferred, and ab initio QM/MM activation energies can be calculated from the EVB simulations by a mapping procedure [39,176]. Molecular dynamics simulations can be performed using techniques developed for purely MM calculations [9,20-22,123]. An alternative for QM calculations is provided by the ab initio molecular dynamics scheme proposed by Car and Parrinello [ 189], which combines directly the electronic structure calculation with the molecular dynamics steps (including the electronic degrees of freedom as dynamical variables). Most applications of ab initio molecular dynamics have been based on density-functional methods. The ab initio MD approach has also been applied in QM/MM simulations, for example in applications to carbonic anhydrase [56] and galactose oxidase [70,190]. It provides the ability to simulate the dynamics of a large system, treating the central region with a correlated QM method, in a computationally efficient manner, and will certainly become increasingly important in studies of enzyme reactions. The density-
625
functional approach, with the use of pseudopotentials, will be especially useful for metalloenzymes. A final important consideration is that of quantum effects for the nuclei in the reaction. It should be stressed that the term QM/MM as employed in this chapter refers to methods which calculate a potential energy surface through a QM treatment of the electronic structure of part of the system. Molecular dynamics simulations of enzymes with QM/MM potentials have generally employed only classical dynamics, calculating classical trajectories for all the atoms. This applies also to ab initio molecular dynamics simulations. For reactions such as proton transfer and hydride transfer, however, quantum dynamical effects are likely to be significant [ 10,11 ]. Quantum effects include tunnelling and zero-point corrections. To include such effects, it may be possible to apply methods which treat some nuclear degrees of freedom quantum dynamically, in mixed quantum/classical dynamics simulations schemes [38,191]. Various methods have been developed for studying quantum dynamics in a classical environment [ 192-196]. Feynman path integral based techniques also allow quantum effects to be studied [72,182]. Semiclassical variational transition state theory is another approach, which includes quantized vibrations, and semiclassical multidimensional tunnelling contributions. This method has recently been applied in QM/MM calculations on the reaction in the enzyme enolase by Alhambra et al. [67] (described in section 6.4 below). It was found that quantum effects are important in determining the absolute rate constant. 5. PRACTICAL ASPECTS OF MODELLING ENZYME REACTIONS
5.1. Choice and preparation of the starting structure A first requirement for a QM/MM calculation on an enzymic reaction is a detailed, accurate, structure which closely resembles a point on the pathway of the chemical reaction. In practice, this means that a high-resolution X-ray crystallographic structure of an enzyme complex is needed. The ensembles of structures provided by NMR can provide useful complementary information on dynamics and interactions, but generally cannot define atomic positions precisely enough for mechanistic calculations. Homology models are not suitable, even when they correctly represent the protein fold and secondary structure, as the protein backbone position can, at best, be predicted only to within an RMS deviation of 1-1.5A, and the positions, relative orientations and packing of side chains in the model are likely to be considerably less accurate. A crystal structure of an enzyme alone, with no ligands bound at the active site, may also be of little use, because it is difficult to predict binding modes and protein conformational changes associated with binding.
626
It is essential that the structure used as a starting point for calculations is truly representative of the reacting enzyme complex. In most instances at present, the structure of an enzyme-inhibitor complex must be used. The inhibitor should resemble closely the substrate, product, transition state or an intermediate, in its bound conformation (some care must be taken, as inhibitors have occasionally been incorrectly identified as transition state analogues on the basis of highaffinity binding, leading to incorrect assumptions about the transition state structure). It is generally not possible to solve the structures of active enzymesubstrate complexes, for the obvious reason that the reaction turns over too quickly. In favourable cases, crystallographers have, however, been able to trap and study reactive complexes (of enzymes with substrates or intermediates bound) [16]. This can be achieved through rapid changes in pH or temperature (e.g. through freezing crystals), with mutants which are catalytically incompetent for one or more of several steps, or the use of alternative substrates which react slowly or incompletely. Short exposure times, achieved with synchrotron radiation, and techniques for triggering reaction synchronously throughout a crystal (and monitoring its progress) are also important. In some cases, therefore, it may be possible to approach the ideal where structures of several complexes along the reaction pathway are available to begin calculations. The experimentally determined structures would indicate differences in conformation between these various complexes. QM/MM calculations could then model the unstable intermediates and transition state structures, and the mechanism of chemical and structural changes, providing a picture of the whole pathway [17]. This combination of theory and experiment holds great promise [ 125], and is likely to be increasingly important in future. Where even a single structure representing an enzyme complex exists, however, calculations can provide significant insight. As with any protein simulation, the nature and limitations of the structural solutions for proteins provided by X-ray crystallography should always be borne in mind [ 125]. One obvious point is that hydrogen atoms are generally not observed because of their low electron density (neutron diffraction experiments can be useful to overcome this problem), and so it can be difficult to assign protonation states unambiguously, and to decide between possible rotamers or tautomers. This, and other factors such as model bias (for example in a molecular replacement solution), or simple error in construction of a model, may lead to the structural model being incomplete or incorrect in some places. It should be remembered that a crystal structure represents an average not only over all the molecules in the crystal, but also over the time course of data collection. Dynamic and static disorder can lead to weak electron density in some regions, complicating the building and refinement of a structural model [9,125]. Caution should be exercised, and such factors considered, when a
627
structure is chosen for calculations. One basic example is the decision that must be made of which conformation to use in a simulation when alternative conformations for a side chain or ligand have been included in the crystallographic structure. It is very important to assign protonation states of protein groups correctly. The pKas of titratable groups may be altered considerably from their solution values by the protein environment. An incorrect choice of charge may lead to disruption of the protein structure, or, particularly for groups close to the active site, may affect the reaction energetics. At worst, the wrong choice of charged state for an active site group may lead to the wrong mechanism being modelled. It can be helpful to carry out initial calculations of p Kas, for example by finitedifference Poisson-Boltzmann methods [ 197]. Another procedure, which can be useful for studying a small number of potentially charged groups, is to perform (MM or QM/MM) molecular dynamics simulations on different possible charge states, and examine which remains closest to, and is most consistent with, the crystal structure [50,51]. 5.2. Choice of theoretical model Following construction of an initial model complex, energy minimization to relax structural strain should be carried out. In a full treatment, solvent should be included in the calculation (to model solvent dielectric shielding, etc.), and if it is to be represented explicitly, it should be equilibrated after solvation of the structure (for example by molecular dynamics). A molecular dynamics simulation can also be useful to allow for small conformational changes in going from an inhibitor to a substrate complex, or for generating structural 'snapshots' for calculations (see section 4 above). As with many other aspects of a QM/MM calculation, the body of knowledge and expertise built up in many years of simulations of biological macromolecules has a lot to contribute [2022]. For example, it has been demonstrated that the treatment of electrostatic interactions can be vitally important in determining the accuracy of molecular dynamics simulations of biomolecules. Use of a nonbonded cutoff can introduce errors, as can truncation of the simulation system. Such choices must be made with care in a QM/MM calculation, while bearing in mind that the relatively large computational cost of QM/MM, as opposed to MM, calculations. A compromise will often have to be made between the level of QM theory, the size and treatment of the simulation system, and the nature of the calculations (for example, whether dynamics are to be simulated, or the potential energy surface explored). It is of course necessary to use a level of QM treatment that adequately represents the system or reaction under consideration. Semiempirical molecular orbital methods have many well-known failings, but can perform reasonably
628
well for some systems (see section 5.2.1). Ab initio calculations at the HartreeFock level perform poorly for many reactions (indeed semiempirical QM/MM calculations may be superior to low-level ab initio QM/MM treatments for some cases), but inclusion of electron correlation significantly increases the computational expense. It may be necessary to use large basis sets to achieve reasonable accuracy. Pseudopotentials [56] or effective fragment potentials [109,110] to represent atomic cores or groups not directly involved in electronic changes may be useful in reducing computational demands of ab initio calculations. Density-functional theory based QM/MM methods show great promise, as they should allow calculations including electron correlation, with good scaling properties [121,122,148,198]. At present, however, ab initio QM/MM calculations on enzyme reactions are very time-consuming, and are generally limited to geometry optimization and study of a limited number of structures (e.g. representative reactant and intermediate geometries) [ 19]. They may be particularly useful for developing, testing and calibrating more approximate methods such as semiempirical QM/MM, empirical valence bond or quantum mechanical/free energy perturbation techniques. 5.2.1. Performance of semiempirical QM methods Semiempirical QM/MM methods, while not suitable for all systems, allow bigger QM regions to be treated, and more extensive calculations to be performed (e.g. molecular dynamics simulations and calculations of activation free energies). At present, AM1 [30] and PM3 [31] are the two most widely used semiempirical methods for QM/MM calculations. Each of these methods has some advantages over the other in certain areas. There are known faults of the various methods, some of which are due to their approximate theoretical basis, whereas others are characteristic of a particular parameterization. An example is that semiempirical methods are often found to overestimate barriers to reaction [7,35,132]. There is an ongoing debate about the relative merits of MNDO, AM 1 and PM3. It is widely acknowledged that AM1 and PM3 are an improvement on the earlier method in most areas. However, the strengths and weaknesses of a method must be taken into account when deciding which to apply to a problem, and in interpreting the results. For example, although most molecular properties are predicted better by AM1 and PM3 than by MNDO, the latter has been found to reproduce ab initio electrostatic potentials more closely in some instances [131 ]. PM3 has been found to be more accurate for calculating hydrogen bond geometries than its predecessor [ 199,200]. In particular, AM 1 predicts bifurcated hydrogen bonds, i.e. for molecules capable of donating two hydrogen bonds, such as water, AM 1 predicts two hydrogen bonds will be formed to one acceptor atom, for example in the water dimer. This is contrary to experimental and high-level ab initio results showing
629
formation of a single hydrogen bond. PM3 correctly predicts non-bifurcated geometries [201]. The improved performance of PM3 is attributed to the reparameterized Gaussian terms in the core-core terms. However, hydrogen bond distances in PM3 are about 0.1-0.2 .~ too short, and other non-bonded interactions may be misrepresented because of an underestimation of repulsive effects. AM 1 is reported to treat C-H...O hydrogen bonds better than PM3 [202]. AM1 treats phosphorus badly, and this error is rectified in PM3 [ 132]. A serious error of PM3, not present in AM1, is that it calculates the charge on nitrogen atoms as being much too positive [7,201 ]. This is due to an error in the parameterization [203]. In models of acetyl-CoA enolization in citrate synthase, PM3 was found to perform less well than AM1, overestimating a proton transfer barrier (from N to O), and misrepresenting the charge distribution of the imidazole side chain of histidine [7]. PM3 has been found to calculate proton affinities of carboxylic acids more accurately than AM1 [204], but PM3 is subject to a greater degree of non-systematic error than AM1, making the PM3 predictions less reliable in general [132]. Other authors have found both methods to be of similar accuracy for calculating acidifies [205]. Problems with the PM3 core repulsion function have been identified, which cause difficulties for inter- or intramolecular O...H, N...H or N...O interactions [206]. For these reasons, AM 1 is often a better choice, if only because its errors are more predictable and less erratic. However, it can be difficult to assess, a priori, whether AM1 or PM3 is likely to be more accurate for a particular application. This, and the limited accuracy of all these methods, makes it necessary to test their performance against experiment and high-level ab initio results(which should include electron correlation for consistency with the semiempirical calculations)for small model systems [7,19,35]. Further development of semiempirical methods [207], such as in the MNDO/d method of Thiel and coworkers [166], should increase their accuracy and range of applicability. It may be possible in some cases to improve semiempirical calculations through the use of parameters optimized for the particular reaction of interest. The parameters within the semiempirical method are altered so that the results (e.g. for the activation barrier and energy change on reaction) are closer to highlevel ab initio findings. Genetic algorithms have been used in the optimization process. This approach has been applied in QM/MM calculations [126,162], and to study the reactions of small organic molecules [208,209]. However, some workers have suggested that it may be difficult to fit several properties of good potential energy surfaces at once [210].
5.3. Definition of the QM region A centrally important issue is the choice of the QM system, i.e. which atoms to include in the electronic structure calculation, and which to represent through
630
molecular mechanics. Clearly all groups believed to participate directly in the chemical changes must be included in the QM region. It may be desirable, or essential, to include neighbouring groups in the QM region also. These could include charged species, or groups involved in hydrogen bonds with QM atoms. Generally a larger QM region should give better results (at increased computational cost), although this may not always be true (e.g. semiempirical methods treat hydrogen bonds poorly, and such interactions can be sometimes better treated with one of the groups treated by MM, particularly when the (MM) van der Waals parameters of the QM and MM groups have been optimized to reproduce good ab initio results [ 162]; also semiempirical methods may perform poorly for conformational properties, such as their inadequate treatment of peptide bonds [18,175], for which an MM correction is often applied [132]). In most enzymes, the QM and MM regions will be covalently bonded, and so a decision must also be made of where to partition molecules. The boundary between covalently bonded QM and MM atoms should be chosen to be as far as possible from sites of chemical or electronic change, and should not disrupt conjugated systems. Ideally, the boundary also should not be in very close proximity to highly charged groups. A carbon-carbon single bond in an aliphatic group is a good choice for partitioning (e.g. the C tx-Cl] bond in an amino acid, to allow a side chain involved in catalysis to be treated QM). The choice of QM region should be considered carefully, and the sensitivity of the results to changes in the size of the QM results should be tested. To ensure that interactions between the QM and MM regions are represented correctly, results for model systems should be tested against high-level ab initio calculations (or experimental data). If necessary, the MM parameters (atomic charges and van der Waals parameters for the MM atoms, van der Waals parameters for the QM atoms) should be adjusted to give good interaction energies and geometries.
5.4. Mechanistic questions Apart from technical considerations, it is important to identify what mechanistic questions can be addressed by the calculations. For example, different possible candidates for an active site base could be compared, or perhaps the stability of various proposed intermediates could be studied. There is a wealth of unanswered questions regarding aspects of specific enzyme reaction mechanisms, and also on the general principles of enzyme catalysis (e.g. what factors or interactions are most important in reducing the activation energy, how the enzyme reaction compares to the equivalent reaction in solution, etc.). Different types of calculation, within the QM/MM framework, may be required to address different types of question, as demonstrated by the variety of applications and approaches described in section 6. Consider what
631
experimental data exist for the enzyme to be studied, what aspects of the mechanism they define, and what questions they leave unanswered. It is important to have in mind how connections can be made to experimental results such as kinetic and mutagenesis data. These can test the theoretical approaches, which can then be used to make predictions which can be tested experimentally. An example of such a connection is the correlation of QM/MM activation energies with experimentally observed rate constants for the hydroxylation of a series of aromatic substrates for para-hydroxybenzoate hydroxylase [49] described further below.
6. SOME RECENT APPLICATIONS In this section, some recent studies of enzyme-catalysed reaction mechanisms will be discussed, to illustrate the types of approaches that have been used, the mechanistic questions which have been investigated along with the insight which has resulted from the calculations, and some practical aspects of the simulations. These applications represent only a selected small number of the many studies of enzymes by QM/MM techniques which have been carried out by many groups in recent years (see section 2.1 above). In some cases, these are applications in which the author has been directly involved. Space permits only a limited discussion here, and the interested reader is urged to refer to the publications cited here and above for more details.
6.1. Para-hydroxybenzoate hydroxylase (PHBH) A recent study of para-hydroxybenzoate hydroxylase (PHBH) by Ridder et al. [49,50] provides an interesting example of the validation of QM/MM calculations on the enzyme mechanism by comparison with experimental data. The correlation found between calculated activation barriers and the logarithm of experimental rate constants for a series of alternative substrates also provides support for the proposed mechanism hydroxylation of hydroxylation. These studies are a good example of QM/MM reaction pathway calculations for an enzyme, including technical aspects of system set-up and practical considerations, and so will be outlined here in some detail. PHBH catalyses the hydroxylation of para-hydroxybenzoate (at the 3-position, adjacent to the hydroxy group). This is an important step in the degradation of a wide variety of aromatic compounds by microbes. PHBH serves as a model enzyme for the family of external flavoprotein monooxygenases. The reaction cycle of this enzyme is complex, involving initial binding of substrate, followed by two-electron reduction of the flavin cofactor by NADPH. Incorporation of molecular oxygen results in a C4a-peroxyflavin intermediate. The distal oxygen of the C4a-peroxyflavin is protonated, to form the C4a-
632
hydroperoxyflavin. It is this C4a-hydroperoxyflavin intermediate which is believed to react with the substrate in the hydroxylation step. Final release of water returns the flavin cofactor to its original oxidized state. This reaction cycle is supported by transient kinetic studies of the similar enzyme phenol hydroxylase [211 ]. Two (of several) intermediates observed in these experiments were believed to be the C4a-peroxyflavin and C4ahydroperoxyflavin, respectively. These results support the assignment of the C4a-hydroperoxyflavin as the form of the cofactor that reacts with the substrate in the hydroxylation step. A number of mechanisms have been proposed for the hydroxylation step itself in PHBH. These include an electrophilic aromatic substitution type mechanism, proceeding via heterolytic cleavage of the peroxide bond (this mechanism was one of the earliest proposals). Other hydroxylation mechanisms have been put forward in which homolytic cleavage of the peroxide bond is invoked. Thus far it has not been possible to distinguish between these mechanisms by experimental studies, in part because of the short lifetimes of the reaction intermediates [211 ]. The balance of the experimental data has been interpreted as favouring the heterolytic cleavage mechanism, but this conclusion was far from certain [50]. Within this mechanism, the electrophilic attack of the C4a-hydroperoxyflavin intermediate on the substrate is believed to be the rate-limiting step under physiological conditions (25 o C, pH 8). Another important issue concerns how substrates may be activated for reaction by the enzyme: it is believed that this is achieved by deprotonation of the hydroxy group [50]. Ridder et al. [49,50] have investigated the substrate hydroxylation step in PHBH using the QM/MM method of Field et al. [41], implemented in the (academic) simulation package CHARMM [116]. The crystallographic structure of PHBH with the flavin cofactor in the oxidized form, and the substrate para-hydroxybenzoate also bound was used for the calculations. This structure is a good model of the reactive complex - no significant conformational changes are expected during the reaction cycle, because the structures of the product-PHBH complex and of the reduced cofactor-PHBH complex are very similar to that of PHBH containing the reduced cofactor and substrate. The structure of the flavin was modified to the reactive C4ahydroperoxyflavin form. Hydrogens were added to the structure (hydrogen atoms are typically not observed in protein crystal structures solved by X-ray diffraction), using a 'united atom' (only hydrogen atoms on polar groups represented) topology for groups in the molecular mechanical region [212], with all atoms represented explicitly for the quantum mechanical region. 330 water molecules were present in the crystal structure and were included in the calculations. The active site is buried within the protein, and so it was felt unnecessary to add more water molecules.
633
Initial energy minimization was performed for a sphere of 10A radius around the distal oxygen of the C4a-hydroperoxyflavin - this sphere included 295 atoms in total, including the substrate, cofactor, several amino acid residues and 15 water molecules. All other atoms in the structure were constrained to remain in their initial positions. The QM region in these calculations contained 50 atoms, while the whole system included 4840 MM atoms. This is a reasonably large system to study, but these calculations were feasible on standard workstations. A nonbonded cutoff of 11/~, was employed, and the energy of the system was minimized (by an adopted basis Newton-Raphson algorithm [9]) to a gradient tolerance of 0.01 kcal/mol/~, (approximately 780 steps of minimization). This minimized structure was used to begin reaction pathway calculations on the hydroxylation step.
In kca t
.•0oo-
~ o
coo-
6-
coo-
COO-
~
o-
coo-
17
I
18
I
19
I
20
I
21
I
22
23
QM/MM activation energy (kcal/mol)
Figure 3. Plot of the logarithm of experimentally determined rate constants (kcat, min-~) against energy barriers calculated with a QM/MM method for hydroxylation of severalparahydroxybenzoate derivatives by the enzymepara-hydroxybenzoate hydroxylase (PHBH), showing a linear correlation (r=0.96) between the calculated and experimental results [49,50]~ This correlation supports the proposed mechanistic scheme, and the identification of the hydroxylation step as rate-limiting within it. It also validates the QM/MM method for this application, and shows that QM/MM results can be predictive and will be useful in the development of quantitative structure-activity relationships (QSAR). (Adapted from ref. 49, with thanks to Dr. L. Ridder). The hydroxylation step was described by a reaction coordinate consisting of the difference between the breaking (peroxide distal oxygen-proximal oxygen)
634
and forming (distal oxygen-substrate C3) bond lengths (i.e. r = d(Op-Od) -d(OdC3)). The reaction coordinate was harmonically restrained (with a force constant of 5000 kcal mol A-2, using the CHARMM RESD function [6,149]) to a series of values intermediate between the reactants and products, to convert the initial hydroperoxyflavin complex to the cyclohexadienone product complex in a series of steps of 0.1A. It was found to be preferable loosely to restrain active-site water molecules to their initial positions, to avoid large changes in their position and related large changes in energy which could introduce discontinuities along the reaction path. The position of the energy maximum along the path, representing the approximate transition state for the reaction, was determined more precisely by varying r in steps of 0.01/~ in this region. Harmonic vibrational analysis of the approximate transition state structure (treating the QM atoms only) verified that it possessed only one significant imaginary frequency, corresponding to the transfer of the OH group, as required. QM/MM reaction pathway calculations also located approximate transition states for four fluorinated alternative substrates [49]. Barriers to reaction were calculated as the difference in total energy between the approximate transition state and the appropriate reactant complex for each substrate. Experimental data on the rates of conversion of the fluorinated substrates were available [49]. The experimental and QM/MM results were compared by plotting ln(kcat) against the calculated activation energy. This showed a good correlation (r = 0.96, Figure 3). Analysis of the QM/MM results showed that differences between the reactivity of the different substrates are likely to be due to electronic factors. In absolute terms, the calculated barriers are too high, by a factor of approximately 1.5. This error was attributed to the AM1 QM treatment (the performance for AM 1 has been tested by comparisons to ab initio RHF/6-31G(d) and B3LYP 6-311G(d,p) results for small models of the reaction [50]). The correlation between calculated and experimental results does show, however, that the QM/MM results are useful for comparisons of related substrates, and indicates the potential of this approach for predicting the efficiency with which an enzyme may convert new substrates. Such predictions could be useful in environmental clean-up operations. In addition, the correlation lends support to the mechanism modelled being truly representative of the enzyme-catalysed reaction, and to the proposal that the hydroxylation step is rate-limiting [49,50]. Further insight into the mechanism was provided by QM/MM analysis of the roles of active site groups on the hydroxylation step in PHBH. Qualitative information on nearby groups which significantly affect the reaction energetics can be found by a simple decomposition procedure described as a first-order perturbation analysis [ 13,48]. In essence, this involves the calculation of the
635
energy of previously optimized reactant, product and transition state complexes, while including a reduced number of the surrounding chemical groups (e.g. individual amino acids). Comparison of the reaction energy difference or barrier with and without a particular group present gives an indication of the effect of that group on the reaction step. This amounts to a difference in interaction energies, and cannot be directly related to, for example, experimentally measured free energy differences for mutant enzymes (AAG values for particular mutations can be calculated by free energy perturbation techniques, involving for example the mutation of a MM amino acid side chain, but the sampling required makes this approach highly computationally intensive for QM/MM calculations). Inclusion of effects such as solvent dielectric screening is also problematic. Nevertheless, the worth of this simple approach has been demonstrated by the identification of important catalytic and binding groups in several enzymes [8,18,48,51,61]. In the case of PHBH, a proline residue (Pro293) was found to stabilize the transition state for hydroxylation through an interaction between its carbonyl oxygen and the Hd atom of the transferring OH group. This interaction with Pro293 stabilizes the transition state specifically, having very little effect on the energy difference between the reactants and products (the interaction is only present in the transition state), and so appears to be a catalytic effect in the purest sense [50]. Intriguingly, a similar effect of a proline has been found in calculations on the similar hydroxylation reaction catalysed by phenol hydroxylase (Ridder et al., work in progress). The proline residue appears to be conserved in flavoprotein monooxygenases, and so it is tempting to speculate that the particular geometry allowed or enforced by proline at this position places its main chain carbonyl group in a good position to stabilize the transition state for substrate hydroxylation. Another aspect of the insight provided by the QM/MM calculations is that they strongly support substrate deprotonation as important for reaction in PHBH. QM/MM activation barriers for hydroxylation of the protonated substrate, and for para-fluorobenzoate (which is known not to be hydroxylated by PHBH, although it binds and initiates the reaction cycle) were twice as high as that for the deprotonated substrate [50].
6.2. Citrate synthase For some enzymes, for example many metalloenzymes, a semiempirical QM/MM treatment is inadequate due to the limitations of the semiempirical methods. In such situations, a more sophisticated level of QM treatment (such as ab initio molecular orbital or density-functional theory) may well be required. An recent example of the application of ab initio QM/MM techniques to an enzyme mechanism is a study of acetyl-CoA enolization in citrate synthase
636
[6]. Ab initio QM/MM methods are important in this case because of the possible role of strong hydrogen bonds in this enzyme. Citrate synthase catalyses the formation of citrate from oxaloacetate and acetyl-CoA in the citric acid (or Krebs) cycle. The reaction is believed to proceed via deprotonation of acetyl-CoA (Figure 4) to form an intermediate (deprotonation of acetyl-CoA is thought to be the rate-limiting step based on kinetic isotope effects), which subsequently attacks oxaloacetate. A citryl-CoA intermediate is formed, which is then hydrolysed to give the products. A central question in this mechanism has been the nature of the nucleophilic intermediate formed in the first step [7, 19,51 ]. Early suggestions were that the enolate of acetyl-CoA was the probable intermediate, but this was later thought to be unlikely on energetic grounds [213], given the large difference in p Ka between the thioester of acetyl-CoA (pK~ estimated at around 21) and the enzymic base (Asp375 in pig CS, apparent pKa 6.5). Concerted acid-base catalysis to form the enol of acetyl-CoA was proposed as a more likely alternative, and this mechanism was apparently supported by structures of the pig and chicken enzymes, showing a conserved histidine (His274) residue positioned well to act as the acid and donate a proton to the carbonyl oxygen of acetyl-CoA [213]. However, comparison with triosephosphate isomerase, which similarly deprotonates a weakly acidic carbon acid substrate using a similar combination of amino acid side chains, and subsequent closer examination of the hydrogen bonding environment of His274, indicated that this residue is in the neutral form (bearing a proton only on N 81) in citrate synthase [ 18,51 ]. This removes the energetic rationale for enol formation, as neutral histidine is a very weak acid (pKa = 14). The nature of the intermediate formed in the deprotonation step, and the means of its stabilization, therefore remained uncertain. A somewhat controversial proposal [214,215] has been made that a special class of 'low-barrier' or 'short, strong' hydrogen bonds is responsible for stabilizing intermediates in a large number of enzyme reactions, including citrate synthase. In a low-barrier hydrogen bond, there is little or no energetic penalty for proton transfer from one hydrogen-bonded partner to the other, and the barrier to transfer is small or non-existent (with a single minimum for the proton equidistant between the two partners in the most extreme cases). Such hydrogen bonds are found in some charged complexes in the gas phase, with large hydrogen bond energies. It has been proposed that such bonds can be formed in enzyme active sites, and that they stabilize otherwise highly unstable reaction intermediates. This proposal has been the subject of much debate and disagreement [19,51]. For citrate synthase, it was proposed that His274 shares a proton with deprotonated acetyl-CoA in an 'enolic' intermediate form [214], this term indicating that it is neither fully the enol nor the enolate, but a strongly hydrogen bonded form intermediate between the two [215]. A requirement for
637
this mechanism is that the effective pKa for deprotonation of His274 at the active site should be approximately equal to that for deprotonation of the enol.
Figure 4. Transition state structure, optimized at the AM1 QM level, for deprotonation of acetyl coenzyme A by Asp375 in citrate synthase, showing the side chains of Asp375 and His274, and the thioester portion of acetyl-CoA. His274 stabilizes the enolate intermediate formed in this step, via an increase in the strength of a hydrogen bond to the carbonyl oxygen of the substrate. The reaction in the enzyme has been modelled by ab initio QM/MM calculations at the RHF/6-31G(d) QM level, with MP2 correlation corrections [6]. Initial QM/MM modelling was carried out at the AM1 semiempirical level [51], which allowed the necessary preliminary extensive energy minimization of the whole system. The ab initio QM/MM geometries were shown to be properly optimized by comparison to results for model systems [7,19]. The QM/MM results [6,51] do not support the proposal that a 'low-barrier' hydrogen bond is involved in catalysing this initial step in the citrate synthase reaction. This question has been investigated by ab initio Q M / M M calculations on the first step of the citrate synthase reaction [6,19]. A high-resolution crystallographic structure of chicken citrate synthase complexed with acetylCoA and the inhibitor R-malate [213] was used for these studies - this should be a good model of the reactive ternary complex, as R-malate binds in the same way as oxaloacetate. The enzyme-substrates complex was first subject to QM/MM minimization at the AM1 semiempirical level [51 ]. Given the likely importance of histidine side chains in the mechanism, particular attention was paid to choosing the correct protonation states for three conserved histidine residues found at the active site. The approach taken was to minimize the energy of the complex with these histidines in all possible charge states [51 ]. It was concluded that His274 and His320 were neutral (in the N ~ 1 and N~2 singly protonated tautomeric forms, respectively). A third histidine, His238, which binds oxaloacetate, remained somewhat uncertain, but there were some
638
indications that the positively charged (doubly protonated) form produced better results, and this was chosen for ab initio QM/MM calculations. The performance of different semiempirical methods, and different levels of ab initio treatment, was tested by calculations on small models of the reaction [7, 19,36,216]. The system treated in the ab initio QM/MM calculations included all residues with at least one heavy atom within 17A of the terminal carbon of acetyl-CoA in the crystal structure. Atoms between 14A and 16A from the centre of the simulation system were subject to harmonic restraints, while those further than 16A from the centre were fixed in place. All other atoms were free to move. The CHARMM program [ 116] was used throughout, interfaced [84,121,217] with the GAMESS-US package [151] for ab initio QM/MM calculations. CHARMm22 MM [212] parameters were used. The enzyme-substrates complex was initially minimized at the semiempirical (AM1) QM/MM level, treating the side chains of Asp375 and His274, and the thioester part of acetylCoA QM [51 ]. Reaction pathway calculations, similar to those described above for PHBH, generated the structures of the enolate and enol of acetyl-CoA in citrate synthase. The substrates, enolate and enol complexes were all extensively optimized at the semiempirical QM/MM level (over 1000 steps of conjugate gradients minimization [51 ]). This was essential to achieve properly minimized structures for the MM part of the system, as the ab initio QM/MM calculations are costly and very time consuming [19]. In the ab initio QM/MM calculations, the side chains of His274 and Asp375, the thioester of acetyl-CoA, and a crystallographically-observed water molecule identified as being important from the semiempirical results [51 ], were treated as QM atoms. 'Link' atoms (hydrogens) were used, and all QM atoms (including link atoms) interacted with all MM atoms except the neighbouring covalently bonded groups. Energy minimization was performed initially at the RHF/3-21G(d) QM/MM level (50 steps of 'steepest descents' and 100 steps of conjugate gradients by the Powell method for each structure), and extended to the RHF/6-31G(d) level (25 steps steepest descents and 85 steps conjugate gradients). The resulting structures were compared with fully optimized bimolecular complexes of the reaction in the gas phase [7], showing that the geometries produced by the QM/MM procedure were properly optimized. The transition state for proton abstraction from acetyl-CoA by Asp375, and the enolic intermediate, were modelled by restraining the relevant transferring proton to be equidistant between the two heavy atoms between which it was exchanged (e.g. between N81 of His274 and the acetyl oxygen of acetyl-CoA in the enolic complex). The N~51...O distance in the enolic structure was found to be 2.48A, a typical heavy atom separation for strongly hydrogen bonded
639
complexes [7,215]. The effects of electron correlation were included by MP2/6-31G(d) single point calculations for the QM system only, using the geometries produced by QM/MM optimization. The computational demands are such that correlated ab initio QM/MM calculations on large systems are at present impractical [ 19] (the simulation system contained 33 QM atoms, 1618 MM protein atoms and 22 MM water molecules). The results showed that the enolate form of acetyl-CoA to be markedly more stable than the enolic or enol within citrate synthase [6]. The enolate is therefore identified as the likely intermediate. It is stabilized by hydrogen bonds with His274 and a water molecule. These hydrogen bonds are 'normal', in that the proton appears in each case to be covalently bonded to the donor atom only, and the heavy atom separations are not unusually short. His274 and the water molecule donate hydrogen bonds to the carbonyl oxygen of acetylCoA in the substrates complex, but these hydrogen bonds are greatly strengthened on deprotonation of acetyl-CoA, as negative charge accumulates on it. This stabilizes the enolate relative to the substrate, reducing the energy required for its formation. Of all the various forms of acetyl-CoA considered, the enol was found to be highest in energy. The results provide no indication that the acidity of neutral His274 is significantly increased by the protein environment. The criterion for formation of a low-barrier hydrogen bond, that the pK~s of the bonded partners should be approximately equal [214, 215], therefore does not appear to be met. Overall, the results do not support the proposal that a low-barrier hydrogen bond is involved in catalysis by citrate synthase. They show that a mechanism involving stabilization of the enolate of acetyl-CoA by normal hydrogen bonds from His274 and a water molecule is likely. This mechanism is supported by calculations on models of the next step of the reaction, which show the enolate to be a good nucleophile for attack on oxaloacetate [216].
6.3. Human Immunodeficiency Virus Protease Semiempirical QM/MM calculations, while more considerably computer intensive than purely molecular mechanical methods, do allow fairly rapid calculations on moderate to large size systems. They can be used in molecular dynamics simulations; to characterize reaction pathways; to study dynamics of enzyme reactions; and to carry out more extensive conformational sampling than is possible with ab initio techniques. Proteins are known to be complex dynamic entities, exhibiting large numbers of conformational substates, making such considerations very important [9]. Liu, Mtiller-Plathe and van Gunsteren [54] have investigated the catalytic mechanism of HIV protease by QM/MM molecular dynamics simulations, combining the PM3 semiempirical Hamiltonian with the GROMOS87 MM force field and the SPC water model,
640
using GROMOS [22] interfaced with MOPAC [132]. The system studied included the entire enzyme dimer (99 amino acids per subunit) with a hexapeptide bound, in a periodic box of dimensions 51A x 54A x 72,a, containing 5427 SPC water molecules. The side chains of two catalytic aspartates/aspartic acids (Asp25 and Asp25'), the scissile peptide and a lytic water molecule were treated as QM atoms. Using a time step of 0.5 fs, simulations were performed for a few hundred picoseconds in total. Umbrella sampling was used to investigate several possible reaction pathways. In simulations in which the lytic water oxygen was brought together with the carbonyl oxygen of the substrate, changes in the arrangement of the active site were observed, and at a short O-C distance, spontaneous concerted proton transfers (from Asp25' to the water oxygen and from the water to Asp25) were observed. This was the only reaction pathway leading to a reactive transition state. The mechanism suggested by the calculations [54] is different from that guessed from structural data alone. 6.4. Enolase In an exciting recent application, Alhambra et al. [67] have used QM/MM methods to study quantum dynamical effects in the rate-limiting step of the conversion of 2-phospo-D-glycerate to phosphoenolpyruvate by enolase. This step is a proton abstraction from the substrate by Lys345. Part of the side chain of Lys345, plus the substrate, were included in the QM region, giving a total of 25 QM atoms. A generalized hybrid orbital method [140] as applied at the boundary between the QM and MM regions. AM1 was used for the QM system, with the CHARMM22 MM force field [20]. A stochastic boundary molecular dynamics simulation [9] was performed to equilibrate the structure, and to generate a representative configuration for the reaction (through the application of harmonic restraints to the breaking and forming bonds). The MM atoms were frozen in this configuration for all subsequent calculations. The geometry of the QM system was then optimized to give reactant, product and transition state (saddle point) structures. A reaction path was generated for the QM system, and the free energy of activation along the path was calculated. The rate constant for the reaction was calculated by canonical variational transition state theory. Quantum tunnelling effects were included in the small curvature approximation [67]. Primary and secondary kH/kD kinetic isotope effects were calculated. The variationally optimized transition state geometries were found to be different for transfer of a proton or a deuteron, the first indication of such a difference for an enzyme reaction [67]. Quantum treatment of vibrations was found to be important for the calculation of the rate constant, and variational transition state theory was important for calculating kinetic isotope effects. The
641
primary k~/kD result found at the canonical variational transition state theory level with small curvature tunnelling corrections was 3.5, close to the experimental result of 3.3.
6.5. Malate dehydrogenase S-Malate is converted into oxaloacetate in the citric acid cycle by the action of malate dehydrogenase, with NAD as a cofactor. This reaction has been studied in detail by Cunningham et al. using semiempirical QM/MM methods [61 ]. The reaction is believed to proceed through a proton transfer between the substrate and a histidine residue, and hydride transfer between the substrate and NAD. A key mechanistic question is the order of these steps, a question not readily resolved by experiment, because in wild-type malate dehydrogenase the ratelimiting step appears to be a conformational change or product release. AM 1 was used for the QM system with link atoms for QM/MM partitioning, and CHARMM22 parameters [20] for the MM atoms. The van der Waals parameters of the QM atoms which were used had been developed by fitting QM/MM interaction energies (for representative small molecules with a TIP3P water molecule) to RHF/6-31G(d) results for the same systems [126,162]. These parameters had been tested through calculations of free energies of proton and hydride transfer in solution. Cunningham et al. also tested the energetic and structural results of AM 1 for the reactions of interest against ab initio findings [61]. The simulation system consisted of all amino acid residues within 18A of the substrate malate (built into an inhibitor complex structure). A representative protein configuration was generated by stochastic boundary molecular dynamics using MM only, with H-heavy atom distances for both the proton transfer and hydride transfer all constrained to 1.3A using SHAKE. Following this equilibration, QM/MM molecular dynamics was carried out for 40ps in total at 300K, and the system was then cooled to OK over a further 20ps, before 5000 steps of minimization to generate a reference structure for calculations. A minimum energy surface was calculated by repeated minimizations at a range of reaction coordinate values. From the minimum energy surface, it was concluded that in the reaction direction from malate to oxaloacetate, proton transfer precedes hydride transfer. The results indicate that malate dehydrogenase significantly alters the energetics of the reaction, making the energies of bound oxaloacetate and bound malate very similar. Energy decomposition analysis [48], as outlined in 6.1 above, identified groups which significantly affect the proton transfer and hydride transfer steps.
6.6. Lactate dehydrogenase Similar mechanistic questions arise for lactate dehydrogenase, an enzyme which has been investigated with QM/MM methods by Moliner, Turner and
642
Willams [58,167] and independently by Ranganathan and Gready [60]. Both groups applied a semiempirical QM treatment (AM1). Williams et al. used CHARMM22 all-atom MM parameters for the NADH cofactor and protein, whereas Gready et al. used the AMBER all-atom force field, with charges for the cofactor calculated from a fit to the AM1 electrostatic potential. There were also differences in the treatment of the QM/MM boundary: Ranganathan and Gready used QM methyl capping groups at the boundary (which do not interact with the MM atoms), whereas Moliner et al. used the link (hydrogen) atom [41] method for the QM/MM boundary. The positioning of the boundary between the QM and MM regions also differed somewhat, but in both cases the substrate, His 195 and the nicotinamide ring of the cofactor were included as QM atoms. Gready and Ranganathan included the whole of the enzyme subunit (of dogfish lactate dehydrogenase), with crystallographic water molecules, in their calculations, whereas Williams et al. treated a truncated system, settling (after a number of tests) on a simulation system containing all residues within 12A of the carbonyl carbon of pyruvate (in a complex of B. s t e a r o t h e r m o p h i l u s lactate dehydrogenase), placed within a ball of water molecules of radius 17A. Only a small number of 'key' MM residues were allowed to move in Ranganathan and Gready's work, whereas Williams et al. performed more extensive optimization of their system. Ranganathan and Gready calculated a QM/MM potential energy surface on a grid for two reaction coordinates representing proton and hydride transfer, respectively. This surface indicated a stepwise mechanism, with hydride transfer from NADH to pyruvate preceding proton transfer from His 195 to the carbonyl oxygen of the substrate. As the authors discussed [60], this result was unexpected, as this group and others have performed purely QM 'supermolecule' calculations on lactate dehydrogenase which indicate that proton transfer should occur before hydride transfer, and experimental studies of hydride transfer in solution also indicate proton transfer to be the first step. As outlined above, Bash et al. concluded that in the similar malate dehydrogenase reaction, proton transfer preceded hydride transfer. Turner, Moliner and Williams refined transition state structures in lactate dehydrogenase, with full gradient relaxation of between 1900 and 2000 atoms. These workers have developed the GRACE software package [167], designed to interface with programs such as CHARMM. The algorithms within GRACE allow refinement of transition state structures (saddle points), calculation of Hessian matrices, vibrational frequencies and intrinsic reaction coordinates within large systems. In the application to lactate dehydrogenase, six configurations were generated from a previous (MM) molecular dynamics simulation, and each was subject to QM/MM energy minimization. Subsequently, for each of the six structures, a potential energy surface was
643
calculated on a grid by varying (and harmonically restraining) the hydridepyruvate carbonyl carbon, and His195 HE2-pyruvate carbonyl oxygen distances. In all cases, a single saddle point region was observed on these surfaces. A finer gridpoint search in the saddle point region generated an approximate transition state for each of the six structures, and these were then refined to exact transition state structures. The intrinsic reaction coordinate (leading in one direction to reactants, and one to products) was calculated for each. The transition state structures showed some invariant essential features (e.g. the degree to which bonds are broken and formed), but differed significantly in the positions of the active site residues. They all indicated the reaction to be a concerted process, with proton transfer almost complete in the transition state, and hydride transfer making a dominant contribution to the transition vector [ 167]. No evidence was found for a reaction in which hydride transfer precedes proton transfer. It remains to be explored why the conclusions of these calculations differ for those of Ranganathan and Gready [60]. One possible cause of these apparent differences is that whereas Williams et al. used the total energy of the system for their energy surfaces, Ranganathan and Gready may have instead used the energy of the QM system (including the electrostatic effects of the protein on it) [218]. Williams et al. do stress that the variation they observed provides an indication that the transition state for an enzymic reaction will represent an average of many related transition state structures [ 167]. These results stress the importance of accounting for structural changes and conformational variability in the enzyme.
6.7. Papa.in Harrison, Burton and Hillier [62] have examined the cysteine protease papain by QM/MM calculations at the AM1 and B3LYP/3-21G(d) QM levels, using Gaussian94 in combination with AMBER. This coupling allowed optimization of the QM atoms in the field of the MM atoms. The QM and MM regions were not optimized simultaneously, but iterative minimization (minimization of the QM region with the MM region fixed, and separate minimization of the MM region with the QM region fixed), was possible. A model of papain with a substrate bound was constructed from an enzyme-inhibitor complex structure, energy minimized by MM alone, and solvated with 95 TIP3P water molecules. It was found that the enzyme stabilizes the ion pair of the thiolate of Cys25 and the imidazolium of His159 [62]. In v a c u o , the neutral form of these residues is strongly preferred, but the zwitterionic form is more stable within the enzyme. Cys25 is the nucleophile that attacks the carbonyl group of the substrate. A concerted mechanism of amide hydrolysis was found [62]. The transition state structure for the reaction was located and characterized for the QM region, with the positions of some QM atoms frozen to accelerate convergence in the
644
geometry optimization. The effects of replacing Asn175 by alanine were explored. It was found that this change reduced the stability of the zwitterionic form of the Cys-His pair relative to the neutral form by 3-4 kcal/mol, a finding in agreement with site-directed mutagenesis studies. The authors proposed that the role of Asn175 is to maintain His159 in the correct orientation relative to Cys25 in the active site.
6.8. Influenza neuraminidase The question of whether a covalent intermediate is involved in the reaction catalysed by neuraminidase from influenza virus has been investigated through semiempirical QM/MM calculations by Thomas et al. [72]. AM1 was used for the QM treatment, with either CHARMM22 [20], or the OPLS-AA [ 154], MM parameters [154]. The systems treated [72] included between 47 and 73 QM atoms, 860 MM atoms which were free to move (within a sphere of radius 11A), and approximately 15250 fixed MM atoms (a selection radius of approximately 35A was used, including all protein atoms and one to two solvation layers at the protein surface). Free energies along reaction coordinates for different steps of the reaction were calculated by umbrella sampling, with molecular dynamics simulations performed with the particular reaction coordinate restrained sequentially to values between the reactants and products of a particular step. Quantum dynamical effects were studied by path integral simulations [72]. The results indicated that a covalently bound is not formed between the enzyme and the sialosyl cation intermediate. Instead, direct hydroxylation of the cation is favoured. However, only a relatively small energetic difference between these two alternative pathways was found, and so Thomas et al. suggest [72] that it may be possible to design effective neuraminidase inhibitors which do bind covalently to the enzyme.
6.9. cAMP-dependent protein kinase Hart et al. have studied the role of a conserved aspartate in protein kinases [79]. The QM/MM implementation was similar to that applied to papain by this group as described above [62], combining Gaussian94 [153] and AMBER [21]. PM3 was used to treat a QM region consisting of 46 atoms, with the surrounding MM protein and crystallographic water (around 6200 MM atoms) fixed, in cAMP-dependent kinase. Comparing the energy of alternative possible product complexes, these authors suggested a mechanism in which the conserved aspartate does not function as a base, but instead remains deprotonated, and stabilizes the phosphorylated serine of the product. The transition state for this reaction was characterized.
645
6.10. Chorismate mutase Chorismate mutase provides an example of an enzyme where QM/MM calculations have identified an important catalytic principle at work [8]. This enzyme catalyses the Claisen rearrangement of chorismate to prephenate. The reaction within the enzyme is not believed to involve chemical catalysis, and this pericylic reaction also occurs readily in solution. Lyne et al. [8] investigated the reaction in chorismate mutase in QM/MM calculations, at the AM 1 QM level (AM 1 was found to perform acceptably well for this reaction in comparisons with ab initio results for the reaction in the gas phase [8]). Different sizes of QM system were tested in the QM/MM studies (e.g. including the substrate and no, or up to three, protein side chains), and similar results found in all cases. The reaction was modelled by minimization along an approximate reaction coordinate, defined as the ratio of the forming C-C and breaking C-O bonds. Values of the reaction coordinate were taken from the AM1 intrinsic reaction coordinate for the gas-phase reaction. It was found that in the enzyme, the gas-phase geometry of chorismate is not a minimum [8]. Instead, a point further along the reaction coordinate is lower in energy. These results indicate that the substrate is distorted on binding to the enzyme, and that the distortion is functionally important. The bound geometry is closer to the transition state for the rearrangement (e.g. it has a C-C distance for the forming bond of 2.85/k, compared with 3.30,~ for the gas-phase minimum energy geometry). The energy of chorismate itself is increased (by 13 kcal/mol, from gas-phase calculations) by this distortion, though this is offset by an increase in binding energy, as the total energy of the enzyme-substrate complex falls. The substrate is thus destabilized on binding. The structure of the transition state itself was found to be very similar for the transition state structure in the gas-phase reaction (an RMS difference of only 0.08A). Distortion of chorismate contributes significantly to lowering the barrier. The calculated barrier to reaction in chorismate mutase was 17.8 kcal/mol, compared to 42 kcal/mol in the gas phase. Factors other than substrate distortion also play an important part in reducing the barrier to reaction in the enzyme" important interactions were identified by a simple decomposition analysis (as described in sections 6.1 and 6.2 above). It was found that Glu78 and Arg90 specifically stabilize the transition state, relative to the bound substrate [8]. Overall, therefore, catalysis in chorismate mutase can be rationalized in terms of a combination of substrate strain and transition state stabilization. While it is possible to analyse all these catalytic effects as arising from maximal binding in the enzyme being achieved at the transition state, it appears useful to separate the different types of contribution. The possible role of substrate destabilization/distortion or 'strain' in lowering the barrier to reaction in enzyme reactions, as put forward by Haldane [219], and invoked in
646
mechanisms such as that proposed for lysozyme [15], has been controversial [1]. QM/MM results for chorismate mutase [8] apparently provide a clear example of a role for strain in catalysis by this enzyme, the first such demonstration. 7. CONCLUSIONS QM/MM methods have shown themselves to be useful tools for the study of enzyme reaction mechanisms. They have demonstrated their worth in identifying catalytic functions for active site residues (such as a conserved proline in PHBH [49,50]), in addressing mechanistic questions such as distinguishing between alternative possible reaction intermediates (e.g. comparing the enol and enolate of acetyl-CoA in citrate synthase [6,51 ]), and suggesting catalytic principles (such as strain and destabilization of the substrate in chorismate mutase [8]). Calculations can be carried out at ab initio, semiempirical or density-functional QM levels. Transition state structures can be optimized [67,74,84,167], and molecular dynamics simulations can be carried out [57]. Free energy differences, such as activation free energies can be calculated [54,72,162,176], as can quantum effects such as tunnelling and zero-point corrections [67]. More approximate, less computer intensive, QM/MM methods (such as semiempirical QM/MM) have an important role as they allow more extensive simulations to be performed (e.g. dynamics simulations, conformational sampling and detailed exploration of potential energy surfaces), an important factor given the complex structural and dynamic behaviour of proteins. Ab initio QM/MM calculations are highly computationally demanding [6,19,121 ], but will be required for some systems, and can be used to test more approximate approaches. QM/MM methods do require some care in their application, for example in the choice of QM system, and other practical aspects discussed in section 5, and the nature of the QM/MM partitioning and interactions, as outlined in section 2 above. Their application is not, as yet, as standardized as for purely MM or purely QM methods. For any given application, the performance of the QM/MM model should be tested. They are necessarily hybrid methods, and it is not always clear beforehand which coupling schemes may be most appropriate, and consistent with both the QM method and MM force field. Testing of different approaches for QM/MM combination will therefore continue to be important. Possible areas for improvement include the treatment of the QM/MM junction in partitioned covalently bonded molecules, and MM representations going beyond the simple invariant point charge model. Methods to reduce the computational demand of QM/MM simulations are also being developed, including mapping from simulations run with more
647
approximate potential functions [ 176], and methods to reduce the number of QM energy or gradient calculations required in a simulation through perturbation theory estimates [220,221] or systematic interpolation [98,222]. In a simulation of a solvated protein, much computational effort is spent calculating the behaviour of the solvent. Models which include solvation in an average sense, such as continuum solvation models [223], therefore promise to allow more extensive simulations (e.g. on longer timescales, giving better sampling) than explicit solvent models, while including important effects such as dielectric screening. Parameterization of simple QM models (e.g. semiempirical parameters optimized for a particular reaction [ 162,208,209]) can allow a good combination of speed and accuracy. The QM/MM philosophy generally allows great flexibility in the description of the system. The power and potential of the QM/MM approach make it clear that the importance of these methods for investigations of the fundamental processes of enzyme catalysis will continue to increase.
ACKNOWLEDGEMENTS The author would like to thank his collaborators in the work described here. He is an Engineering and Physical Research Council (EPSRC, UK) Advanced Research Fellow, and thanks the BBSRC (UK), and the Wellcome Trust, for support of his research.
REFERENCES 1. 2. 3. 4. 5. 6. 7. 8. 9.
A. Fersht, Enzyme Structure and Mechanism. 2nd ed., Freeman, New York, 1985. W.P. Jencks, Catalysis in Chemistry and Enzymology, Dover, New York, 1987. B.G. Miller, T.W. Traut, and R. Wolfenden, J. Am. Chem. Soc., 120 (1998), 2666. R. Wolfenden, X.D. Lu, and G. Young, J. Am. Chem. Soc., 120 (1998), 6814. L. Pauling, Am. Sci., 36 (1948), 51. A.J. Mulholland, P.D. Lyne, and M. Karplus, J. Am. Chem. Soc., 122 (2000), 534. A.J. Mulholland and W.G. Richards, J. Phys. Chem. B, 102 (1998), 6635. P.D. Lyne, A.J. Mulholland, and W.G. Richards, J. Am. Chem. Soc., 117 (1995), 11345. C.L. Brooks, III, M. Karplus, and B.M. Pettitt, Proteins, A Theoretical Perspective of Dynamics, Structure and Thermodynamics, Wiley, New York, 1988. 10. B.J. Bahnson, D.-H. Park, K. Kim, B.V. Plapp, and J.P. Klinman, Biochemistry, 32 (1993), 5503. 11. T. Jonsson, M.H. Glickman, S. Sun, and J.P. Klinman, J. Am. Chem. Soc., 118 (1996), 10319. 12. A. Warshel, Computer Modeling of Chemical Reactions in Enzymes and Solutions, Wiley, New York, 1991. 13. M. Karplus, J.D. Evanseck, D. Joseph, P.A. Bash, and M.J. Field, Faraday Discuss., 93 (1992), 239.
648
14. E.B. Nickbarg, R.C. Davenport, G.A. Petsko, and J.R. Knowles, Biochemistry, 27 (1988), 5948. 15. C.C.F. Blake, et al., Proc. Roy. Soc. ser B, 167 (1967), 378. 16. G.J. Davies, et al., Biochemistry, 37 (1998), 11707. 17. A.J. Mulholland, G.H. Grant, and W.G. Richards, Protein Engng., 6 (1993), 133. 18. A.J. Mulholland and M. Karplus, Biochem. Soc. Trans., 24 (1996), 247. 19. A.J. Mulholland and W.G. Richards, in Transition State Modeling for Catalysis, D.G. Truhlar and K. Morokuma (eds), American Chemical Society, Washington, DC p. 448461 (ACS Symposium Series 721), 1999. 20. A.D. MacKerell, et al., J. Phys. Chem. B, 102 (1998), 3586. 21. W.D. Cornell, et al., J. Am. Chem. Soc., 117 (1995), 5179. 22. W.R.P. Scott, et al., J. Phys. Chem. A, 103 (1999), 3596. 23. D. Lim, J. Jenson, M.P. Repasky, and W.L. Jorgensen, in Transition State Modeling for Catalysis, D.G. Truhlar and K. Morokuma (eds), American Chemical Society, Washington, DC p. 74-85 (ACS Symposium Series 721), 1999. 24. W.J. Hehre, L. Radom, P.v.R. Schleyer, and J.A. Pople, Ab Initio Molecular Orbital Theory, Wiley, New York, 1986. 25. R.G. Parr and W. Yang, Density-Functional Theory of Atoms and Molecules, Oxford University Press, Oxford, 1989. 26. J.I. Steinfeld, J.S. Francisco, and W.L. Hase, Chemical Kinetics and Dynamics, Prentice Hall, Englewood Cliffs, NJ, 1989. 27. C. Amovilli, et al., Advances in Quantum Chemistry, 32 (1999), 227. 28. W. Yang and T.-S. Lee, J. Chem. Phys., 103 (1995), 5674. 29. M.J.S. Dewar and W. Thiel, J. Am. Chem. Soc., 99 (1977), 4899. 30. M.J.S. Dewar, E.G. Zoebisch, E.F. Healy, and J.J.P. Stewart, J. Am. Chem. Soc., 107 (1985), 3902. 31. J.J.P. Stewart, J. Comput. Chem., 10 (1989), 209. 32. T.-S. Lee, D.M. York, and W. Yang, J. Chem. Phys., 105 (1996), 2744. 33. S.L. Dixon and K.M. Merz, J. Chem. Phys., 107 (1997), 879. 34. S.J. Titmuss, P.L. Cummins, A.A. Bliznyuk, A.P. Rendell, and J.E. Gready, Chem. Phys. Lett., 320 (2000), 169. 35. A.J. Mulholland and W.G. Richards, Int. J. Quantum Chem., 51 (1994), 161. 36. A.J. Mulholland and W.G. Richards, J. Mol. Struct. (Theochem), 429 (1998), 13. 37. J. Aqvist and A. Warshel, Chem. Rev., 93 (1993), 2523. 38. P. Grochowski, B. Lesyng, P. Bala, and A. McCammon, Int. J. Quantum Chem., 60 (1996), 1143. 39. A. Warshel and J. Bentzien, in Transition State Modeling for Catalysis, D.G. Truhlar and K. Morokuma, (eds.), American Chemical Society, Washington DC p. 489-499 (ACS SoYmposium Series 721), 1999. 40. J. Aqvist and M. Fothergill, J. Biol. Chem., 271 (1996), 10010. 41. M.J. Field, P.A. Bash, and M. Karplus, J. Comput. Chem., 11 (1990), 700. 42. T.Z. Mordasini and W. Thiel, Chimia, 52 (1998), 288. 43. M.A. Cunningham and P.A. Bash, Biochimie, 79 (1997), 687. 44. A. Warshel and M. Levitt, J. Mol. Biol., 103 (1976), 227. 45. U.C. Singh and P.A. Kollman, J. Comput. Chem., 7 (1986), 718. 46. B. Waszkowycz, I.H. Hillier, N. Gensmantel, and D.W. Payling, J. Chem. Soc., Perkin Trans. 2, (1991), 225.
649
47. B. Waszkowycz, I.H. Hillier, N. Gensmantel, and D.W. Payling, J. Chem. Soc., Perkin Trans. 2, ( 1991), 2025. 48. P.A. Bash, et al., Biochemistry, 30 (1991), 5826. 49. L. Ridder, A.J. Mulholland, J. Vervoort, and I.M.C.M. Rietjens, J. Am. Chem. Soc., 120 (1998), 7641. 50. L. Ridder, A.J. Mulholland, I.M.C.M. Rietjens, and J. Vervoort, J. Mol. Grap. Modell., 17 (1999), 163. 51. A.J. Mulholland and W.G. Richards, Proteins, 27 (1997), 9. 52. Y.S. Lee, M. Hodoscek, B.R. Brooks, and P.F. Kador, Biophys. Chem., 70 (1998), 203. 53. P. Varnai, W.G. Richards, and P.D. Lyne, Proteins, 37 (1999), 218. 54. H. Liu, F. MiJller-Plathe, and W.F. van Gunsteren, J. Mol. Biol., 261 (1996), 454. 55. D.C. Chatfield, K.P. Eurenius, and B.R. Brooks, J. Mol. Struct. (Theochem), 423 (1998), 79. 56. U. R6thlisberger, in Combined Quantum Mechanical and Molecular Mechanical Methods, J. Gao and M.A. Thompson, (eds.), American Chemical Society, Washington DC p. 264-274, (ACS Symposium Series 712), 1998. 57. D. Hartsough and K.M. Merz, Jr., J. Phys. Chem., 99 (1995), 11266. 58. V. Moliner, A.J. Turner, and I.H. Williams, Chem. Commun., 14 (1997), 1271. 59. V. Moliner, J. Andres, M. Oliva, V.S. Safont, and O. Tapia, Theor. Chem. Acc., 101 (1999), 228. 60. S. Ranganathan and J.E. Gready, J. Phys. Chem. B, 101 (1997), 5614. 61. M.A. Cunningham, L.L. Ho, D.T. Nguyen, R.E. Gillilan, and P.A. Bash, Biochemistry, 36 (1997), 4800. 62. M.J. Harrison, N.A. Burton, and I.H. Hillier, J. Am. Chem. Soc., 119 (1997), 12285. 63. W.G. Han, E. Tajkhorshid, and S. Suhai, J. Biomol. Struc. Dyn., 16 (1999), 1019. 64. C. Alhambra, L. Wu, Z.Y. Zhang, and J.L. Gao, J. Am. Chem. Soc., 120 (1998), 3858. 65. N.A. Burton, M.J. Harrison, J.C. Hart, I.H. Hillier, and D.W. Sheppard, Faraday Discuss., 110 (1998), 463. 66. J.C. Hart, N.A. Burton, I.H. Hillier, M.J. Harrison, and P. Jewsbury, Chem. Comm., (1997), 1431. 67. C. Alhambra, J.L. Gao, J.C. Corchado, J. Villa, and D.G. Truhlar, J. Am. Chem. Soc., 121 (1999), 2253. 68. J. Pawlak, M.H. Oleary, and P. Paneth, J. Mol. Struct. (Theochem), 454 (1998), 69. 69. J. Pawlak, M.H. O'Leary, and P. Paneth, in Transition State Modeling for Catalysis, D.G. Truhlar and K. Morokuma, (eds.), American Chemical Society, Washington DC p. 462472 (ACS Symposium Series 721), 1999. 70. U. R6thlisberger et al., (in preparation), 71. J.A. Barnes and I.H. Williams, Biochem. Soc. Trans., 24 (1996), 263. 72. A. Thomas, D. Jourand, C. Bret, P. Amara, and M.J. Field, J. Am. Chem. Soc., 121 (1999), 9693. 73. P. Amara, A. Volbeda, J.C. Fontecilla-Camps, and M.J. Field, J. Am. Chem. Soc., 121 (1999), 4468. 74. N.A. Burton, et al., in Transition State Modeling for Catalysis, D.G. Truhlar and K. Morokuma, (eds.), American Chemical Society, Washington DC, p. 401-410 (ACS Symposium Series 721), 1999. 75. G. Colombo, G. Ottolina, G. Carrea, and K.M. Merz, Chem. Comm., (2000), 559. 76. V.V. Vasilyev, J. Mol. Struct. (Theochem), 110 (1994), 129.
650
77. G. Schurer, H. Lanig, and T. Clark, J. Phys. Chem. B, 104 (2000), 1349. 78. J. Pitarch, J.-L. Pascual-Ahuir, E. Silla, and I. Tunon, J. Chem. Soc., Perkin Trans. 2, (2000), 761. 79. J.C. Hart, D.W. Sheppard, I.H. Hillier, and N.A. Burton, Chem. Comm., (1999), 79. 80. U. Ryde, Protein Sci., 4 (1995), 1124. 81. S. Antonczak, G. Monard, M.F. RuizLopez, and J.L. Rivail, J. Am. Chem. Soc., 120 (1998), 8825. 82. N. Wu, Y.R. Mo, J.L. Gao, and E.F. Pai, Proc. Natl. Acad. Sci. USA, 97 (2000), 2017. 83. Y.K. Zhang, H.Y. Liu, and W.T. Yang, J. Chem. Phys., 112 (2000), 3483. 84. Q. Cui and M. Karplus, J. Chem. Phys., 112 (2000), 1133. 85. R. Castillo, J. Andres, and V. Moliner, J. Am. Chem. Soc., 121 (1999), 12140. 86. P.L. Cummins and J.E. Gready, J. Comput. Chem., 19 (1998), 977. 87. A.H. Elcock, P.D. Lyne, A.J. Mulholland, A. Nandra, and W.G. Richards, J. Am. Chem. Soc., 117 (1995), 4706. 88. T.R. Furlani and J.L. Gao, J. Org. Chem., 61 (1996), 5492. 89. D. Wei and D.R. Salahub, J. Chem. Phys., 101 (1994), 7633. 90. J. Gao, Acc. Chem. Res., 29 (1996), 298. 91. A.H. deVries, et al., J. Phys. Chem. B, 103 (1999), 6133. 92. J.R. Shoemaker, L.W. Burggraf, and M.S. Gordon, J. Phys. Chem. A, 103 (1999), 3245. 93. T.K. Woo, P.M. Margl, P.E. Blochl, and T. Ziegler, J. Phys. Chem. B, 101 (1997), 7877. 94. J. Gao and X. Xia, Science, 258 (1991), 631. 95. M. Garcia-Viloca, A. Gonzalez-Lafont, and J.M. Lluch, J. Am. Chem. Soc., 121 (1999), 9198. 96. P.L. Cummins and J.E. Gready, J. Comput. Chem., 18 (1997), 1496. 97. G.A. Kaminski and W.L. Jorgensen, J. Phys. Chem. B, 102 (1998), 1787. 98. C.D. Berweger, W.F. van Gunsteren, and F. Mtiller-Plathe, J. Chem. Phys., 108 (1998), 8773. 99. I.P. Mercer, I.R. Gould, and D.R. Klug, J. Phys. Chem. B, 103 (1999), 7720. 100. R.D.J. Froese and K. Morokuma, Chem. Phys. Lett., 305 (1999), 419. 101. M.A. Thompson and G.K. Schenter, J. Phys. Chem., 99 (1995), 6374. 102. M.A. Thompson, J. Phys. Chem., 100 (1996), 14492. 103. T. Clark, A. Alex, B. Beck, P. Gedeck, and H. Lanig, J. Mol. Model., 5 (1999), 1. 104. P.E. Sinclair, A. de Vries, P. Sherwood, C.R.A. Catlow, and R.A. van Santen, J. Chem. Soc.-Faraday Trans., 94 (1998), 3401. 105. M.J. Ramos, et al., Int. J. Quantum Chem., 74 (1999), 299. 106. J.D. Marechal, et al., J. Comput. Chem., 21 (2000), 282. 107. A. Cartier, et al., Theor. Chem. Acc., 101 (1999), 241. 108. A. Warshel and M. Karplus, J. Am. Chem. Soc., 94 (1972), 5612. 109. B.D. Wladowski, M. Krauss, and W.J. Stevens, J. Am. Chem. Soc., 117 (1995), 10537. 110. M. Krauss, N. Luo, R. Nirmala, and R. Osman, in Transition State Modeling for Catalysis, D.G. Truhlar and K. Morokuma, (eds.), American Chemical Society, Washington DC p. 424-438, (ACS Symposium Series 721), 1999 111. S. Dapprich, I. Komaromi, K.S. Byun, K. Morokuma, and M.J. Frisch, J. Mol. Struct. (Theochem), 462 (1999), 1. 112. D. Bakowies and W. Thiel, J. Phys. Chem., 100 (1996), 10580. 113. F. Maseras and K. Morokuma, J. Comput. Chem., 16 (1995), 1170. 114. T. Matsubara, S. Sieber, and K. Morokuma, Int. J. Quantum Chem., 60 (1996), 1101.
651
115. 116. 117. 118. 119. 120. 121. 122. 123.
T.K. Woo, L. Cavallo, and T. Ziegler, Theor. Chem. Acc., 100 (1998), 307. B.R. Brooks, et al., J. Comput. Chem., 4 (1983), 187. G.G. Ferenczy, P.J. Winn, and C.A. Reynolds, J. Phys. Chem. A, 101 (1997), 5446. S.R. Gooding, et al., J. Comput. Chem., 21 (2000), 478. J.L. Gao, J. Comput. Chem., 18 (1997), 1061. S.P. Greatbanks, J.E. Gready, A.C. Limaye, and A.P. Rendell, Proteins, 37 (1999), 157. P.D. Lyne, M. Hodoscek, and M. Karplus, J. Phys. Chem. A, 103 (1999), 3462. R.V. Stanton, D.S. Hartsough, and K.M. Merz, J. Comput. Chem., 16 (1995), 113. J.A. McCammon and S.C. Harvey, Dynamics of proteins and nucleic acids, Cambridge University Press, Cambridge, 1987. 124. J.L. Gao and C. Alhambra, J. Chem. Phys., 107 (1997), 1212. 125. A.T. Hadfield and A.J. Mulholland, Int. J. Quant. Chem. (Biophys. Q.), 73 (1999), 137. 126. L.L. Ho, A.D. MacKerell, Jr., and P.A. Bash, J. Phys. Chem., 100 (1996), 4466. 127. M. Freindorf and J.L. Gao, J. Comput. Chem., 17 (1996), 386. 128. I.B. Bersuker, M.K. Leong, J.E. Boggs, and R.S. Pearlman, Int. J. Quantum Chem., 63 (1997), 1051. 129. P. Sherwood, J. Mol. Graph. Modell., 16 (1998), 275. 130. W. Thiel, J. Mol. Struct. (Theochem), 398 (1997), 1. 131. I. Alkorta, H.O. Villar, and G.A. Arteca, J. Comput. Chem., 14 (1993), 530. 132. J.J.P. Stewart, J. Computer-Aided Mol. Design, 4 (1990), 1. 133. P.B. Karadakov and K. Morokuma, Chem. Phys. Lett., 317 (2000), 589. 134. V. Thery, D. Rinaldi, J.L. Rivail, B. Maigret, and G.G. Ferenczy, J. Comput. Chem., 15 (1994), 269. 135. G.G. Ferenczy, G. Naray-Szabo, and P. Varnai, Int. J. Quantum Chem., 75 (1999), 215. 136. G.G. Ferenczy, J.L. Rivail, P.R. Surjan, and G. Naray-Szabo, J. Comput. Chem., 13 (1992), 830. 137. G.G. Ferenczy, G.I. Csonka, G. Naray-Szabo, and J.G. Angyan, J. Comput. Chem., ! 9 (1998), 38. 138. G. Naray-Szabo, Russian J. Phys. Chem., 74 (2000), 34. 139. X. Assfeld and J.L. Rivail, Chem. Phys. Lett., 263 (1996), 100. 140. J.L. Gao, P. Amara, C. Alhambra, and M.J. Field, J. Phys. Chem. A, 102 (1998), 4714. 141. D.M. Philipp and R.A. Friesner, J. Comput. Chem., 20 (1999), 1468. 142. T.A. Wesolowski and A. Warshel, J. Phys. Chem., 97 (1993), 8050. 143. M.D. Ermolaeva, A. van der Vaart, and K.M. Merz, J. Phys. Chem. A, 103 (1999), 1868. 144. HyperChem 4.0, Hypercube, Inc., Gainesville, Florida, USA, 1997. 145. N. Reuter, A. Dejaegere, B. Maigret, and M. Karplus, J.Phys. Chem. A, 104 (2000), 1720. 146. Y.K. Zhang, T.S. Lee, and W.T. Yang, J. Chem. Phys., 110 (1999), 46. 147. I. Antes and W. Thiel, J. Phys. Chem. A, 103 (1999), 9290. 148. M. Eichinger, P. Tavan, J. Hutter, and M. Parrinello, J. Chem. Phys., 110 (1999), 10452. 149. K.P. Eurenius, D.C. Chatfield, B.R. Brooks, and M. Hodoscek, Int. J. Quantum Chem., 60 (1996), 1189. 150. I. Antes and W. Thiel, in Combined Quantum Mechanical and Molecular Mechanical Methods, J. Gao and M.A. Thompson, (eds.), American Chemical Society, Washington DC p. 50-65, (ACS Symposium Series 712), 1998. 151. M.W. Schmidt, et al., J. Comput. Chem., 14 (1993), 1347.
652
152. R.D. Amos, et al., CADPAC, The Cambridge Analytic Derivatives Package, Cambridge, UK, 1995. 153. M.J. Frisch, et al., Gaussian 94, Gaussian Inc., Pittsburgh, PA, 1995. 154. W.L. Jorgensen, D.S. Maxwell, and J. TiradoRives, J. Am. Chem. Soc., 118 (1996), 11225. 155. R.V. Stanton, L.R. Little, and K.M. Merz, J. Phys. Chem., 99 (1995), 17344. 156. W.L. Jorgensen, J. Chandrasekhar, J.D. Madura, R.W. Impey, and M.L. Klein, J. Chem. Phys., 79 (1983), 926. 157. H. Liu and Y. Shi, J. Comput. Chem., 15 (1994), 1311. 158. Y.Q. Tu and A. Laaksonen, J. Chem. Phys., 111 (1999), 7519. 159. D. Bakowies and W. Thiel, J. Comput. Chem., 17 (1996), 87. 160. J. Gao, J. Phys. Chem., 96 (1992), 537. 161. J. Gao and M. Freindorf, J. Phys. Chem. A, 101 (1997), 3182. 162. P.A. Bash, L.L. Ho, A.D. MacKerell, Jr., D. Levine, and P. Hallstrom, Proc. Natl. Acad. Sci. USA, 93 (1996), 3698. 163. W.F. van Gunsteren and M. Karplus, J. Comput. Chem., 1 (1980), 266. 164. P. Derreumaux, G. Zhang, T. Schlick, and B. Brooks, J. Comput. Chem., 15 (1994), 532. 165. O. Farkas and H.B. Schlegel, J. Chem. Phy., 111 (1999), 10806. 166. S. Patchkovskii and W. Thiel, J. Comput. Chem., 17 (1996), 1318. 167. A.J. Turner, V. Moliner, and I.H. Williams, Phys. Chem. Chem. Phys., 1 (1999), 1323. 168. O.S. Smart, Chem. Phys. Lett., 222 (1994), 503. 169. R. Czerminski and R. Elber, Int. J. Quantum Chem., Quantum Chem. Symp., 24 (1990), 167. 170. R. Elber and M. Karplus, Chem. Phys. Lett., 139 (1987), 375. 171. R. Elber and M. Karplus, Chem. Phys. Lett., 311 (1999), 335. 172. R. Elber and D. Shalloway, J. Chem. Phys., 112 (2000), 5539. 173. O.S. Smart and J.M. Goodfellow, Molecular Simulation, 14 (1995), 291. 174. S. Fischer and M. Karplus, Chem. Phys. Lett., 194 (1992), 252. 175. S. Fischer, S. Michnick, and M. Karplus, Biochemistry, 32 (1993), 13830. 176. R.P. Muller and A. Warshel, J. Phys. Chem., 99 (1995), 17516. 177. E. Neria, S. Fischer, and M. Karplus, J. Chem. Phys., 105 (1996), 1902. 178. B.J. Gertner, R.M. Whitnell, K.R. Wilson, and J.T. Hynes, J. Am. Chem. Soc., 112 (1991), 8925. 179. R.M. Whitnell and K.R. Wilson, in Reviews in Computational Chemistry, K.B. Lipkowitz and D.B. Boyd, (eds.), VCH Publishers, Inc., New York. p. 67-148, 1993. 180. E. Neria and M. Karplus, Chem. Phys. Lett., 267 (1997), 23. 181. M. Gerstein, A.M. Lesk, and C. Chothia, Biochemistry, 33 (1994), 6739. 182. J.-K. Hwang and A. Warshel, J. Am. Chem. Soc., 118 (1996), 11745. 183. S.H. Northrup, M.R. Pear, C.-Y. Lee, J.A. McCammon, and M. Karplus, Proc. Natl. Acad. Sci. U.S.A., 79 (1982), 4035-4039. 184. I. Ghosh and J.A. McCammon, J. Phys. Chem., 91 (1987), 4878. 185. J.A. McCammon, C.-Y. Lee, and S.H. Northrup, J. Am. Chem. Soc., 105 (1983), 2232~ 186. S.H. Northrup and J.A. McCammon, J. Am. Chem. Soc., 106 (1984), 930. 187. T. Lazaridis, D.J. Tobias, C.L. Brooks III and M.E. Paulaitis, J. Chem. Phys., 95 (1991), 7612. 188. T. Lazaridis and M.E. Paulaitis, J. Am. Chem. Soc., 116 (1994), 1546.
653
189. 190. 191. 192. 193. 194. 195. 196. 197. 198. 199. 200. 201. 202. 203. 204. 205. 206. 207. 208. 209. 210. 211. 212. 213. 214. 215. 216. 217. 218. 219. 220. 221. 222. 223.
R. Car and M. Parrinello, Phys. Rev. Lett., 55 (1985), 2471. U. R6thlisberger and P. Carloni, Int. J. Quantum Chem., 73 (1999), 209. R.B. Gerber and R. Alimi, Isr. J. Chem., 31 (1991), 383. H. Decornez and S. Hammes-Schiffer, Isr. J. Chem., 39 (1999), 397. S. Hammes-Schiffer, J. Phys. Chem. A, 102 (1998), 10443. S.R. Billeter and W.F. van Gunsteren, Computer Phys. Comm., 107 (1997), 61. M.F. Lensink, J. Mavri, and H.J.C. Berendsen, J. Comput. Chem., 20 (1999), 886. H.J.C. Berendsen and J. Mavri, Int. J. Quantum Chem., 57 (1996), 975. J. Antosiewicz, J.M. Briggs, A.H. Elcock, M.K. Gilson, and J.A. McCammon, J. Comput. Chem., 17 (1996), 1633. X.P. Long, J.B. Nicholas, M.F. Guest, and R.L. Ornstein, J. Mol. Struct., 412 (1997), 121. Y.-J. Zheng and K.M. Merz, Jr., J. Comput. Chem., 13 (1992), 1151. M.W. Jurema and G.C. Shields, J. Comput. Chem., 14 (1993), 89. H.S. Rzepa and M. Yi, J. Chem. Soc., Perkin Trans. 2, (1990), 943. L. Turi and J.J. Dannenberg, J. Phys. Chem., 97 (1993), 7899. J.J.P. Stewart, MOPAC 93.00 Manual, Fujitsu Ltd., Tokyo, 1993. C.B. Aaker6y, J. Mol. Struct. (Theochem), 281 (1993), 259. P. Burk and I.A. Koppel, Theor. Chim. Acta, 86 (1993), 417-427. G.I. Csonka and J.G. Angyan, J. Mol. Struct. (Theochem), 393 (1997), 31. M. Kolb and W. Thiel, J. Comput. Chem., 14 (1993), 775. A. Gonzalez-Lafont, T.N. Truong, and D.G. Truhlar, J. Phys. Chem., 95 (1991), 4618. I. Rossi and D.G. Truhlar, Chem. Phys. Lett., 233 (1995), 231. G.H. Peslherbe and W.L. Hase, J. Chem. Phys., 104 (1996), 7882. K. Maeda-Yorita and V. Massey, J. Biol. Chem., 268 (1993), 4134. QUANTA98/CHARMm, Molecular Simulations, Inc., San Diego, CA., 1998. M. Karpusas, D. Holland, and S.J. Remington, Biochemistry, 30 (1991), 6024. W.W. Cleland and M.M. Kreevoy, Science, 264 (1994), 1887. J.A. Gerlt, M.M. Kreevoy, W.W. Cleland, and P.A. Frey, Chemistry and Biology, 4 (1997), 259. A.J. Mulholland and W.G. Richards, J. Mol. Struct. (Theochem), 427 (1998), 175. K.P. Eurenius, D.C. Chatfield, B.R. Brooks, and M. Hodoscek, Int. J. Quantum Chem., 60 (1996), 1189. I.H. Williams, Personal communication, J.B.S. Haldane, Enzymes, Longman, Green & Co., London, 1930. T.N. Truong and E.V. Stefanovich, Chem. Phys. Lett., 256 (1996), 348. T.J. Evans and T.N. Truong, J. Comput. Chem., 19 (1998), 1632. C.D. Berweger, W.F. van Gunsteren, and F. Mtiller-Plathe, J. Comput. Chem., 18 (1997), 11484. S. Boresch, S. Ringhofer, P. Hochtl, and O. Steinhauser, Biophy. Chem., 78 (1999), 43.
This Page Intentionally Left Blank
L.A. Eriksson (Editor) Theoretical Biochem&try - Processes and Properties of Biological Systems Theoretical and Computational Chemistry, Vol. 9 9 2001 Elsevier Science B.V. All rights reserved
655
Chapter 15
Quinones and quinoidal radicals in photosynthesis* Ralph A. Wheeler Department of Chemistry & Biochemistry University of Oklahoma 620 Parfington Oval, Room 208 Norman, OK 73019 USA Quantum chemical studies of quinoidal radicals important in photosynthesis have become paradigms for using computational studies to infer the structures of radicals from in vivo spectroscopic measurements. Initial tests of density functional-based methods for calculating structures, vibrations, and spin properties of models for tyrosyl radicals and p-benzosemiquinone radical anion led to further work to characterize radical anions of plastoquinone, menaquinone, and ubiquinone--quinones native to the photosynthetic reaction center. Sophisticated structural models incorporating hydrogen bonding and side-chain conformational changes were developed to calculate vibrational frequencies, isotropic hyperfine coupling constants, and hyperfine tensor components. Comparing calculated and experimental spectroscopic data allows structural inferences based on spectral data for radicals in different environments. Similar methods are already being used to characterize radicals in other proteins and will ultimately be used to study radical reactions in condensed phases, including proteins.
1. INTRODUCTION All free energy used by biological systems originates from solar energy stored by photosynthesis in green plants, algae, or photosynthetic bacteria. The
* Supported by the U~S. Department of Energy through grant number DE-FG03-97ER14806, the Oklahoma Center for the Advancement of Science and Technology through OCAST award numbers HN3-011 and H97091, and supercomputer time from the U.S. National Science Foundation/National Resource Allocations Committee through award number MCA96N019. Studies of models for tyrosyl radical were supported by the U.S. National Science Foundation through grant number CHE-9419734.
656 fundamental process of photosynthesis may be represented by one deceptively simple chemical reaction: n ~ O + nCO~_+ light -> (CI-I~O)~+ nO~
(1)
Hidden within this equation for using light to synthesize carbohydrates from carbon dioxide and water are myriad intermediate steps, mediated by multiple proteins and small molecules.(/) The primary photochemical events implicit in equation (1) involve energy conversion and storage by separating charge across a membrane. In plants, algae, and photosynthetic bacteria, charge separation is accomplished in different protein-pigment complexes that couple two successive one-electron reductions of quinone electron acceptors to proton transfer, thereby establishing a proton gradient across the membrane.(2) The proton gradient drives subsequent energy storage by adenosine triphosphate (ATP) synthesis, and the ATP is later used in carbon-fixation reactions to make carbohydrates. Quinone reduction is a vital step in the reaction sequence and reduced forms of the quinones appearing as intermediates in photosynthesis (see Figure 1) include several radicals whose properties have been intensively investigated, both experimentally and computationally. Furthermore, plant photosynthesis incorporates two tyrosyl radicals that display quinoidal resonance forms, Figure O
b
79 ~8~ / C H 3
9
H3C~%
~2
20
\
o (b) Menaquinone-n
14
(a) Ubiquinone-n (numbers in parentheses for UQ-1) 10
H3C~/ ~ H 7
12
H
20
\
13
(c) Plastoquinone-n Figure 1. Quinones native to various photosynthetic reaction centers.
~
H n
657
:;5.
"b'"
"b'"
R
R
R
R= alanyl (CH2CH(NH2)COOH) Figure 2. Different resonance structures for tyrosyl radical. 2. The computational characterization of the structures and spectroscopic properties of the quinones and quinoidal radicals generated during the course of photosynthetic charge separation form the subject of this review. The thermodynamics of quinone reduction has been reviewed elsewhere(3) and reviews discussing the strengths and limitations of different computational methods and basis sets are also available.(4, 5) 1.1
Plant Photosystem II In green plants, algae, and cyanobacteria, the primary photochemical events of photosynthesis occur in the protein-pigment complex called photosystem II (PSII). PSII consists of more than ten polypeptide chains and a number of co-factors important for electron transport.(/, 6) The co-factors are believed bound to two homologous polypeptides approximately 32 kD in size (D1 and D2). Photoexcitation of the PSII reaction center drives single electron transfer from the primary electron donor, P 680' (probably a dimer of chlorophyll a) to the primary electron acceptor, one of two pheophytin a molecules. The reduced pheophytin transfers the electron on to a primary plastoquinone acceptor, QA, to form the semiquinone anion, Qn-" Then, QA- transfers the electron through a non-heme iron to a secondary plastoquinone, QB' to form QB-" After the initial charge separation, P68o is reduced by a redox-active tyrosine, Z, and the tyrosyl radical, Z*, is reduced by the manganese complex of the oxygenevolving system (although PSII contains two other redox-active tyrosines, D and M, their functional roles have not yet been elucidated). In a similar reduction cycle, QB- accepts a second electron, two protons, and is reduced to its hydroquinone form, Q H 2. The hydroquinone QBH2 is loosely bound to the polypeptide D1 that spans the thylakoid membrane and QBH2 is in equilibrium with a pool of approximately ten plastoquinones. The plastoquinones from the pool replace QBH2 and regenerate the original reaction center. The overall reaction is
658
2 H20 + 4 hv + 2 QB ") 2 QBH2 + 0 2, QB = plastoquinone, PQ-n
(2)
The reaction center thus evolves the oxygen necessary for assimilating CO 2 in the form of NADPH and pumps protons from the matrix to the inside of the thylakoid membrane. The proton gradient drives subsequent energy storage by adenosine triphosphate (ATP) synthesis and the ATP is later used in carbonfixation reactions to make carbohydrates.
1.2
Bacterial Photosynthetic Reaction Centers Bacterial photosynthetic reaction centers exhibit many structural and functional homologies with reaction centers from green plants.(2, 7, 8) Bacterial reaction centers from Rb. sphaeroides and Rhodopseudomonas (Rps.) viridis contain similar co-factors as PSII: a "special pair" of bacteriochlorophylls, P870, two bacteriopheophytins, two bound quinones, and an iron bound to four histidine nitrogens and two oxygens of a glutamate. The secondary quinone QB is ubiquinone (Figure l a) or menaquinone (Figure l b) in bacteria, but QB is plastoquinone (Figure l c) in green plants. The protein composition of the bacterial reaction center consists of only three polypeptides, denoted L (light), M (medium), and H (heavy). Extensive DNA sequence homology between the genes for the two integral membrane protein subunits L and M in the bacterial RC with those of the D1 and D2 subunits of PSII,(9) plus similar hydropathy profiles for reaction center proteins from bacteria and PSII from plants,(10) have led to extensive use of bacterial reaction centers as models for PSII from green plants. X-ray diffraction structures of the photosynthetic reaction centers from Rps. viridis(ll) and Rb. sphaeroides(12-14) support the inferred similarities between the folding of reaction center and PSII proteins, but details of the quinone binding sites apparently differ. Early X-ray diffraction structures of Rb. sphaeroides, for example, imply hydrogen bonding between one carbonyl oxygen of Q~ and side chains of both His L190 and Ser L223,(12-14) but subsequent X-ray structures show different QB binding modes, characterized by different non-covalent contacts between QB and the protein.(15) Since details of quinone binding to the photosynthetic reaction center proteins in Rb. sphaeroides, Rps. viridis, and PSII remain topics of debate, binding of quinoidal radicals in the reaction centers is even more uncertain. Although quinone structures and short, non-covalent contacts between quinones and proteins are available from X-ray diffraction structures, analogous information for the quinoidal radicals usually must be inferred indirectly from spectroscopic data. The primary spectroscopic methods used to infer the structures, side-chain conformations, and intermolecular contacts of quinoidal
659 radicals involved in photosynthesis are infrared, Raman, and magnetic resonance spectroscopies. Since quinoidal radicals are usually very reactive in solvents and in the gas phase, spectroscopic data for these radicals in the gas phase or in solvent are frequently unavailable for comparison with spectroscopic measurements of radical-containing reaction centers. Calculations are thus an important complement to experimental spectroscopic measurements because (1) calculations provide spectral data for the isolated radicals and (2) calculated spectral data for radicals with various conformations or intermolecular contacts may be useful for inferring structures or non-covalent interactions between radicals and the protein matrix.
2. TESTS OF COMPUTATIONAL METHODS FOR CALCULATING PROPERTIES OF QUINOIDAL RADICALS A variety of computational methods are available for modeling the structures, vibrations, and spin properties of quinoidal radicals. Hartree-Fock and post-Hartree-Fock molecular orbital methods are familiar and widely used,(16, 17) but they frequently give values of <$2> larger than the analytical value, indicating contamination of the ground state wavefunction by improper spin states.(18) In cases where the calculated value of <$2> is too large, predicted spin properties are certainly suspect, but structures and vibrational frequencies may be suspect as well. The more accurate of the post-Hartree-Fock methods also take prohibitively long computational times for the large, biochemically important radicals described here. Their use would therefore require very simplified structural models for complex biochemical radicals. Until recently, computational chemists were thus faced with the uncomfortable yet all-too-familiar dilemma of calculating very accurate properties for oversimplified models, or inaccurate properties for more realistic models of biochemical radicals. By the mid-1990's, however, small molecule tests of density functional-based calculations had shown the potential of the methods to give accurate structures and properties for radicals containing several atoms. Once density functional methods became generally available, they offered the possibility of constructing more accurate quantum chemical models for the numerous radicals important in biochemistry, for reasonable computer time investments. This review emphasizes the role of recently developed density functional-based methods in reproducing the structures, vibrations, and spin properties of the quinoidal radicals important in photosynthesis. The choice was made to allow focus on the more sophisticated structural models and does not imply that ab initio MO methods have no place in biochemical modeling. On the contrary, MO calculations are currently the only way to calculate some
660 features that depend on electronic excited state properties, such as electronic gfactors. Ab initio MO methods also allow for systematic improvement to approximations for electron correlation, so they are and will remain the method of choice for testing !ower-level methods. The following section in fact describes some early tests to determine how well density functional-based methods could reproduce experimental and ab initio MO-derived structures, vibrations, and spin properties of the quinoidal radicals important in photosynthesis. 2.1
Tyrosyl Radical and its Phenoxyl andp-Cresyl Radical Models. Several different radicals have been used to model tyrosyl radical, shown in Figure 2. Phenoxyl radical (Figure 3a) is the simplest model for tyrosyl radical's side chain, since phenoxyl radical has a hydrogen atom in place of tyrosyl radical's peptide backbone (an "alanyl chain"). Many ab initio molecular orbital(19-24) and density functional-based(24-29) calculations have been published for phenoxyl radical, but to our knowledge only one local density functional study has been published to compare directly the calculated structures, vibrations, and spin densities of phenoxyl and tyrosyl radicals.(25) P-cresyl radical (p-methylphenoxyl radical, Figure 3b) is a more realistic structural model of tyrosyl radical because its methyl group is a more faithful representative than a hydrogen atom for tyrosyl radical's alanyl chain. The relatively free rotation of the methyl group on p-cresyl radical makes calculations difficult, but the methyl hydrogens provide a minimal model for the hyperfine coupling provided by the hydrogens of tyrosyl radical's alanyl chain. Several calculations of p-cresyl radical's properties have indeed been published,(27, 30-32) but only recently have calculations been performed to compare the structures, vibrations, spin densities, and hyperfine coupling constants of all three radicals.(33) This section summarizes computed structures
:6
.b'-
.b'.
R
R
R
(a) R = H, phenoxyl radical (b) R = methyl,p-cresyl radical Figure 3. Different resonance structures of p-phenoxyl and p-cresyl radicals, models for tyrosyl radical.
661 and properties of phenoxyl radical and p-cresyl radical models for tyrosyl radical. In 1995, Qin and Wheeler published the first tests of density functionalbased methods for the structures, vibrations, and spin properties of a cyclic conjugated radical.(24, 25) Their work presented phenoxyl radical structures, atomic spin densities, and vibrational frequencies calculated by using a variety of density functional (DF), hybrid Hartree-Fock/density functional (HF/DF), and molecular orbital (MO) quantum chemical methods and compared calculated results with previously published data. In addition to providing tests ot several methods,(24) their work also evaluated the accuracy of the phenoxyl radical model for tyrosyl radical.(25) Table 1 compares calculated bond distances for phenoxyl radical (planar, C2v symmetry) with those from the most sophisticated MO calculation available, CASSCF/6-311G(2d,p) calculations.(23) Clearly, bond distances calculated by using the density-functional-based meth,)ds that agree best with the CASSCF calculation are the hybrid HF/DF B3LYP/6-31G(d) and B3P86/6-31G(d) methods. None of the tested methods gave a calculated CO bond distance in excellent agreement with the CASSCF result and, indeed, subsequent DF and HF/DF calculations with larger basis sets make it ~toubtful that the CO bond distance is converged, even with the largest basis sets employed so far.(26) A complete set of calculated vibrational frequencies was also reperted(24, 25) and indicate that each calculation gave a true minimum energy sl:ructure. Comparing calculated and experimental(35-37) vibrational frequencies showed that frequencies calculated by using DF-based methods are, on average, much closer to experimental frequencies than those computed by using MO ~aethods, even after scaling the UHF/6-31G(d) and UMP2/6-31G(d) frequencies to bring them into better agreement with experiment. This result was subsequently found to be general, as multiplicative scaling factors for frequencies calculated by Table 1 Bond distances for phenoxyl radical, calculated by using CASSCF,(23, 34) UHF, UI~IP2, and various density functional methods.(24) Carbon atom numbering starts with C1 bondc;d to the oxygen and proceeds around the ring. Except for the first column of CASSCF calculations, all calculations were carried out using the 6-31G(d) basis set. Bond CASSCF DF Theory_ Length(,~) 6-311G(2d,p) 6-31G(d) UHF UMP2 SVWN BLYP B3LYP B3P86 Cx-O 1.228 1.236 1.254 1.225 1.253 1.270 1.251t 1.254 Cx-C2 1.454 1.455 1.440 1.461 1.445 1.465 1.45~ 1.449 C2-C3 1.370 1.377 1.387 1.355 1.373 1.388 1.371t 1.375 C3-C4 1.411 1.414 1.409 1.404 1.403 1.420 1.41q) 1.407 C2-H 1.073 1.075 1.074 1.085 1.095 1.093 1.086 1.086 C3-H 1.074 1.075 1.075 1.086 1.096 1.094 1.087 1.087 C4-H 1.073 1.073 1.075 1.086 1.096 1.094 1.087 1.086
662 using DF-based methods are closer to 1.0 than those for MO methods.(38) Particularly important are those frequencies measured experimentally and displayed in Figure 4. According to B3LYP calculations, the CO stretching mode (A, symmetry) appears at an unscaled frequency of 1498 cm ~ and compares very well with the experimentally measured frequency of 1505 cm ~. One of the two CC stretching modes calculated to appear at the unscaled frequencies 1459 cm" (B 2 symmetry) and 1433 cm ~ (A~ symmetry) corresponds to the band observed at 1398 cm ~, whereas the second was not observed. It should also be noted that calculated vibrational frequencies are harmonic frequencies, which may differ from experiment because of anharmonicities in the radical's vibrations or other limitations of the methods or basis sets. Electronic spin density ratios were also calculated to test the abilities of the various methods to reproduce relative spin densities derived from ESR experiments.(39, 40) Table 2 shows that DF-based methods have values of <$2> closer to the exact value than MO methods, indicating less contamination of the ground electronic state by higher spin states.(18) Spin density ratios calculated by using Mulliken population analysis(41) for each computational method confirm the alternant pattern of large and small spin densities inferred from ESR experiments for phenoxyl radical. Calculations also imply that spin density ratios calculated from the local DF SVWN method are closest to experimentally-derived spin densities, and those calculated from the UMP2/631G(d) method are also close. Hybrid HF/DF methods give spin density ratios of intermediate quality. Limitations of the test include the well-known limitations of Mulliken's method of population analysis, as well as inherent deficiencies of the methods and basis sets being tested. In addition, spin densities are usually derived from experiment by using hyperfine coupling constants measured by ESR and thus provide only indirect comparisons with calculations. Despite these limitations, initial work to use DF-based methods to characterize phenoxyl radical spawned numerous studies of biologically important radicals, including glycyl radical,(42-45) the indolyl side chain of tryptophanyl radical,(46-49) the imidazolyl side chain of histidinyl radical,(50, 51) and several more detailed studies of models for tyrosyl radical.(26-32) Table 2 Experimentally derived(39, 40) and calculated(24) ratios of spin density distributions and the projected <$2> values for the phenoxyl radical. Calculations were done using methods listed. Expt. derived Calculated Expt. E x p t . UHF UMP2 BLYP SVWN B3P86 B3LYP Pp~/Po~o 1.5 1.5 1.0 1.7 1.3 1.4 1.3 1.3 Ppara/Pmeta 5.3 5.5 1.1 4.7 3.3 5.1 2.3 2.4 Ppara/Poxygen 1.2 1.8 0.90 0.84 0.95 0.89 Projected <$2> .75(exact) 1.1 1.0 0.75 0.75 0.75 0.75
663 i
cal. 1498 i (At) exp. 1505 6
cal. 1604 exp. 1552 ( A 1 )
l
\ V
cal. 1433 (exp 1398) /~ (A1)
cal. 1459 (B2) (exp. 1398) O
~,
N.
/ cal. 1353 exp. 1331 O (B9
cal. 1175 (At) exp.1157 ~
~-H cal. 986 exp. 990
H
~ (A0
cal. 811 exp. 840
cal. 1019 exp. 1050 O (A1)
A
cal. 531 ,~ (A1) exp. 528 "
Figure 4. Selected vibrational modes of phenoxyl radical calculated by using the B3LYP/631G(d) method, including mode symmetries and both calculated(24) and experimentally measured frequencies.(35-37)
664 This initial work for phenoxyl radical has been extended by several groups to calculate hyperfine coupling constants and components of the hyperfine tensor for various models of tyrosyl radical by using DF and HF/DF methods.(27-32) Since hyperfine coupling constants are derived directly from experiment, they give a more direct test than spin densities of the computational methods. Furthermore, hyperfine coupling constants are not subject to the limitations of population analysis since isotropic hyperfine coupling constants are proportional to the electronic spin density at the nucleus, PN, according to the formula(52) ao = [(8~'3) g gN 13 I3N]ON (where a ~ is the hyperfine coupling constant, g is the electronic g factor, l] is the electronic Bohr magneton, and gN and IN are the analogous values for nucleus N) Although the usual basis sets fail to give particularly accurate hyperfine properties, basis sets augmented in the core region were developed to give especially accurate spin density distributions near the nucleus. Tests show that these new basis sets generally give particularly accurate hyperfine coupling parameters. For example, Table 3 compares calculated and experimental proton hyperfine coupling constants for phenoxyl radical and predicts heavy atom hyperfine coupling constants(29) (note that experimental measurements can usually only determine the magnitude, not the sign of isotropic hyperfine coupling constants). The shows that HF/DF methods with augmented basis sets provide very accurate proton hyperf'me coupling constants and slightly less accurate data for non-hydrogen atoms. Hyperfine coupling varies with the geometry of a radical and with its immediate environment, so comparing calculated and experimentally measured hyperfine coupling constants and hyperfine tensor components allows structural inferences to be drawn. For p-cresyl radical (and p-ethylphenoxyl radical), the dependence of the beta protons' hyperfine coupling constants on the angle between the phenoxyl ring and the ~C-~H bond was investigated.(31, 32) The expected cosine relation between hyperfine coupling constants and torsional angle was found and verified by comparison with measured proton hyperfine coupling constants and X-ray diffraction structures for tyrosine hydrochloride and ribonucleotide reductase (assuming the tyrosyl radical maintains the same conformation as the tyrosine residue observed in the X-ray structure). The dependence of proton hyperfine coupling constants on torsional angle also verified inferences made by Babcock et al. concerning the orientation of tyrosyl radical's ring relative to the beta protons in various enzymes, including the reaction center from plant photosystem II (whose ring reportedly makes approximately a75 ~ angle with the bond to the alpha carbon).(31) Phenol radical
665 Table 3 Experimental(40) and calculated(29) isotropic hyperfine coupling constants and calculated spin densities for phenoxyl radical. Calculations were done using the B3LYP/6-31G(d) method. hfcc's Spin density Atom Expt. Calculated Calculated c1 17.4 0.41 C2 -10.1 -0.17 C6 -10.1 -0.17 C3 12.7 0.32 C5 12.7 0.32 C4 -11.5 -0.12 o - 11.8 0.44 H1 10.2 -9.7 -0.02 H2 1.9 3.1 0.01 H6 1.9 3.1 0.01 H3 6.6 -7.7 -0.02 H5 6.6 -7.7 -0.02 cation as well as various hydrogen-bonded phenoxyl radical models were studied to determine the effect of perturbing the oxygen on spin densities and hyperfine parameters.(30-32, 53) For all systems studied, calculations imply a reduced spin density on the oxygen and increased spin density on the ipso carbon atom, with corresponding modifications of hyperfine coupling constants. Although these computational results have been used to infer the geometry of hydrogen bonding and, in some cases, to confirm the identity of the hydrogen bond donor to tyrosyl radicals, some caution is warranted because models studied may not be the only ones capable of giving excellent agreement with spectroscopic. Nonetheless, the quantitative agreement between first-principles calculations and experiment is impressive and implies that computational chemists are poised to make significant contributions to the interpretation of ESR and ENDOR experiments and their relation to radical conformations and non-covalent contacts. In the near future, quantum chemical computations to model the chemistry of tyrosyl radicals will probably aim to include more accurate models for intermolecular contacts, study the effects of dynamical motions on spectroscopic properties, and attempt to explain the reactions of tyrosyl radicals.
2.2
Para-benzoquinoneand its Semiquinone Radical Anion.
In 1995-96, Scott Boesch provided the first tests of various density functional-based quantum chemical methods for calculating the structures, vibrations, and spin properties ofp-benzoquinones and their semiquinone radical anions.(54-56) Although several DF-based methods give comparably accurate
666
[]1.225(2) 1.225 H~44H3) H~I 1~1.34H3 O
H"
"~2)
O
"H
O
O
(a)
(b)
Figure 5. Bond distances for p-benzoquinone (a) determined by electron diffraction(57) and (b) calculated by using the B3LYP/6-31G(d) method.(54) Calculated bond distances are all within three standard deviations of experiment. structures and harmonic vibrational frequencies for p-benzoquinone, p-chloranil, and p-fluoranil, the hybrid HF/DF methods were selected to give a range of properties in particularly good agreement with experiment. For example, Figure 5 shows a comparison of bond distances determined by electron diffraction(57) and B3LYP calculations for p-benzoquinone, the parent molecule of the biologically relevant quinones shown in Figure 1. Figure 5 shows that calculated bond lengths are all within three standard deviations of experiment, the usual standard applied to decide that two experimentally determined structures are identical. The C-C bond distance is within 0.006 /k of the experimental value, the CO bond distance is identical with the experimentally determined distance, and the calculated C=C distance differs from experiment by only 0.001 ,&. Subsequent work implies that the B3LYP method gives only slightly less accurate structural parameters for quinones related to the parent pbenzoquinone, including ubiquinones (UQ-n),(58) plastoquinones (PQ-n),(59) and menaquinones (MQ-n).(60, 61) One-electron reduction of p-benzoquinone to give the pbenzosemiquinone radical anion gives bond length changes whose direction may be predicted from the LUMO ofp-benzoquinone, shown in Figure 6. Since the
Figure 6. LUMO of p-benzoquinone, determined from RHF calculations.(62)
667
gO oQ
oo
oo
oe
O
O
O
l-
e"
O
Figure 7. p-Benzoquinone and resonance structures of the p-benzosemiquinone radical anion demonstrating the more benzenoid character of the radical anion. LUMO is antibonding along the CO and C=C bonds, CO and C=C bonds are predicted to increase in distance upon occupying the orbital with one (or two) electrons, whereas the C-C single bond distances are predicted to decrease. Thus, frontier orbital arguments agree with the valence bond picture in predicting that p-benzosemiquinone becomes more benzenoid upon one-electron reduction (see Figure 7), a prediction confirmed by the calculated bond distances displayed in Figure 5. The figure shows that CO distances increase substantially, to 1.263 A, and the C=C bonds distances also increase, to 1.369 A. As predicted, the CC single bonds of p-benzoquinone contract, to 1.449 A, in pbenzosemiquinone radical anion. To a first approximation, the added electron enters the pi system of p-benzoquinone and should have minimal effect on CH bond distances, and indeed CH distance increase by only 0.004 A. Table 4 shows that the B3LYP method also gives harmonic vibrational frequencies for p-benzoquinone that differ from experimentally determined values(63-65) by an average absolute magnitude of 40 cm 1 (3.9%, corresponding to a multiplicative scaling factor of 0.961, in agreement with more extensive studies of many different molecules(38, 66)). The calculated, scaled frequency for the antisymmetric, B~u CO stretching mode is 1689 cm ~, compared with the experimental value of 1666 cm". The symmetric, Ag CO stretching mode, calculated to appear at 1688 cm" is similarly close to the experimentally measured frequency of 1663 cm l. Likewise, the symmetric and antisymmetric C=C stretching modes (of Ag and B2o symmetry, respectively) are calculated to appear at 1629 cm -~ and 1599 cm ~, respectively, compared to their experimentally measured frequencies of 1657 cm" and 1592 cm". So the B3LYP method not only reproduces the relative ordering of these four modes that appear very close together in frequency, but also gives the actual vibrational frequencies with better accuracy than uniformly scaled Hartree-Fock or MP2 frequencies. Disagreements between calculations and experiment for the analogous modes of tetrachloro- and tetrafluoro-p-benzoquinones,(54) along
668 Table 4 Approximate mode descriptions and experimentally measured(63-65, 69-71) and scaled, calculated(54) vibrational frequencies(cm l) of p-benzoquinone (PBQ) and pbenzosemiquinone radical anion (PBSQ). Calculations were done using the B3LYP/631G(d) method. Frequencies greater than 1000 cm -1 were scaled by 0.9613 and frequencies less than 1000 cm -1 were scaled by 1.0013.(38) PB9 PBSQ Sym. Assignment Expt. Calc. Expt. Calc. ag
b2u b3g b~u b~u ag ag b2u b3g blu b2u b3g ag b2u b2g au blu b3u ag
b2g blu big b3g
b3u ag b3g b2u au
b2g b3u
C-H stretch C-H stretch C-H stretch C-H stretch C=O stretch C=O stretch C=C stretch C=C stretch C-C stretch, C-H bend C-H bend C-C stretch C-H bend C-H bend C-H bend C-H wag C-H wag C-C=C bend C-H wag C-C stretch Ring chair bend C-C stretch C-H wag C=O bend, C-C stretch Ring boat bend C-C-C bend C=O bend C=O bend C=C-C bend C=O chair bend C=O boat bend
3058 3062 3057 3062 1666 1663 1657 1592 1388 1354 1299 1230 1160 1066 1018 989 944 882 774 800 728 766 601 505 447 459 409 330 249 89
3099 3096 3080 3080 1689 1688 1629 1597 1353 1343 1277 1193 1131 1049 985 974 952 907 778 799 758 761 603 515 456 456 413 338 241 101
3058
3052 3032 1435 1620
1161
481
3031 1535 1488 1643 1475 1416 1333 1200 1215 1113 1038 947 942 970 860 833 801 796 774 634 520 470 467 392 394 328 139
with subsequent, more detailed studies of p - b e n z o q u i n o n e by Nonella and coworkers,(67, 68) warn us h o w e v e r that calculated vibrational frequencies and the relative ordering of these four modes are highly dependent on calculated force constants. For p - b e n z o s e m i q u i n o n e radical anion, the B 3 L Y P - d e r i v e d vibrational modes are shifted to lower frequencies than in the neutral p - b e n z o q u i n o n e (see
669 Table 4). The antisymmetric, B lu CO stretching modes appears at a calculated, scaled frequency of 1535 cm 1 and was not experimentally observed. The symmetric CO stretch of Ag symmetry appears at 1488 cm ~, 53 cm ~ above the experimentally observed frequency of 1435 cm ~. The two CC stretching modes of Ag and B2u symmetry appear at calculated frequencies of 1643 cm ~ (observed at 1620 cm -') and 1475 cm-' (not detected experimentally), respectively. Thus agreement between experimentally measured vibrational frequencies and calculated, scaled vibrational frequencies is moderately good for pbenzosemiquione radical anion and calculated frequencies shift in the direction implied by weaker, longer CO and C=C bonds upon reduction of pbenzoquinone to form its semiquinone anion. Isotropic hypeffine coupling constants and spin densities calculated for pbenzosemiquinone radical anion are compared with experimental results(72) in Table 5. The table demonstrates that all isotropic hyperfine coupling constants except that for oxygen are within 0.24 of experiment, whereas the oxygen hyperfine coupling constant is within 18% of its experimentally determined value. The same work compared hyperfine coupling constants calculated using basis sets augmented in the core to provide a better representation of electron density near the nucleus (shown in the table) with those calculated by using the 6-31G(d) and 6-311G(d,p) basis sets. These and related tests for various other substituted quinones,(55, 56, 58-61) confirm a result obtained previously for smaller molecules: that the B3LYP method gives isotropic hyperfine coupling constants in better agreement with experiment when the basis set is augmented in the core region. Moreover, hydrogen hyperfine coupling constants are generally more accurately reproduced than those for non-hydrogen atoms.(5) This work has been extended, primarily by O'Malley, to model the effects of non-covalent contacts on hyperfine couplings and to calculate components of the hyperfine tensor.(74-77) O'Malley achieves outstanding quantitative agreement between calculation and experiment for some hydrogen-bonding Table 5 Experimental(72, 73) and calculated(56) isotropic hyperfine coupling constants and spin densities for the radical anion of p-benzoquinone. Hyperfine coupling constants were calculated using the B3LYP/(632141)//6-31G(d) method. hfcc's spin density Atom Ex~a Calc. Exp. .... Calc. C O C2 H
-2.13 -9.46 -0.12 -2.42
-2.18 -7.77 -0.08 -2.18
0.149-0.154 0.078 0.161-0.166 0.258 0.092-0.093 0.086 -0.005
670 geometries. In addition, hydrogen bonding by water or various alcohols results in a redistribution of spin density from the oxygen and ring carbons to the carbonyl carbon and a concomitant decrease in the magnitude of this atom's '3C isotropic and anisotropic hyperfine couplings.
3. CALCULATED PROPERTIES OF IMPORTANT IN PHOTOSYNTHESIS.
QUINOIDAL
RADICALS
After initial tests for tyrosyl radical, p-benzoquinone, and pbenzosemiquinone radical anion showed the power of DF-based methods for calculating accurate structures, vibrations, and spin properties, their use to predict the properties of other important radicals exploded. Published HF/DF calculations to characterize the photosynthetic electron transfer cofactors plastoquinone, menaquinone, and ubiquinone are described next.
3.1
Plastoquinones and their Radicals. Figure 8 compares calculated bond distances of a model for plastoquinone-1 and its radical anion. The model differs from PQ-1 by having methyl groups on the isoprenoid chain replaced by hydrogen atoms. Bond distances in the plastoquinone model (PQ) are similar to those calculated for pbenzoquinone. The CO bond lengths for PQ are both 1.228 A, 0.003 ,~ longer than the CO bonds in p-benzoquinone. Similarly, the calculated C=C bond distances in PQ are 1.344 A and 1.356 A, compared with 1.343 /k in pbenzoquinone. Evidently, the C=C bond adjacent to only one alkyl substituent is nearly identical with that in ~p-benzoquinone, whereas the C=C bond adjacent to two alkyl groups is 0.013 A longer. The CC single bonds of PQ differ the most from those in p-benzoquinone and are approximately 0.03 A longer in PQ. o
111.228 H3C~ .49~~.478/H 1.504"~t ~~ H 1.35611 111.344 / 1.50-~,,, 1.49~.512~ 1"332x~ U3C" 1"49~.228 ""~'514 H
o
]11.268 n3C,~1.4~447/H 1.510"~,'" -'~'~ 1.38111 ,'11.372 1.51 ~ ' - 1 : 4 5 ~ . 5 1 8 ~ ~ U3C"1"460~.272 "'~'508
o
o
(a)
(b)
/
H
~H
Figure 8. Calculatedbond distances for (a) plastoquinone and (b) its radical anion show the more benzenoid character of the anion.
671 Ring carbon to substituent carbon distances are in the range of typical carboncarbon single bond distances, 1.504 ,~ to 1.512 A. Upon reduction of PQ to PQ, its bond distances change in predictable ways, based on changes calculated for p-benzoquinone and predicted from the nodal structure of its LUMO (Figure 6). CO bond distances in PQ increase to 1.268 * and 1.270 A in P Q and ring C=C distances also expand, to 1.381 /k and 1.372 ,~. Once again, the CC bond bounded by two alkyl groups is the longer of the two bonds. As in pbenzoquinone, the CC single bonds of PQ contract upon one-electron reduction, and range from 1.447 A to 1.463 /k in PQ-. The exocyclic C-C bonds have changed very little (0.005 A to 0.006 A) in P Q compared to PQ. Table 6 presents vibrational frequencies and mode assignments calculated for selected modes of PQ and PQ.(59) Because the CO and C - C stretching modes are most easily detectable experimentally for quinones in proteins and are used as diagnostics of quinone-protein interactions, subsequent discussion will focus on calculated CO and C - C stretching modes. Comparing the signature CO and C=C stretching modes of PQ with those of p-benzoquinone shows that the lower symmetry of PQ, combined with the change of effective force constants and masses, results in extensive mixing of p-benzoquinone's vibrational modes in PQ. To solve the problem of comparing vibrational modes for structurally similar molecules such as p-benzoquinone and PQ, Grafton and Wheeler invented the method of vibrational projectional analysis. The method has been described in detail(29, 78, 79) and used to study the effects of isotopic or chemical substitution, oxidation/reduction, and non-covalent contacts on molecular vibrations.(29, 78-80) Briefly, the method uses the vector character of vibrational normal modes and projects each normal mode of an "object Table 6 Approximate mode descriptions, experimentally measured vibrational frequencies(cml)(81) of PQ-9 and scaled calculated harmonic vibrational frequencies(cm-1)(59) for selected normal modes of plastoquinone-1 and plastosemiquinone-1. Calculations were done using the B3LYP/6-31G(d)_method. Calculated frequencies were scaled by 0.9614.(38) Assignment
Expt.b
P___Q Calc.
Expt.
P__Q2 Calc.
CO Stretch
1650
1671
1600
CO Antisymmetric Stretch
1635
1665
1490
Ring C=C/C=O Symmetric Stretch 1620
1643
1475
Ring CC Antisymmetric Stretch
1607
1501
672 molecule" on each normal mode of a "basis molecule". The result (multiplied by 100) gives the percentage of each normal mode of the basis molecule that make up each mode of the object molecule. Obviously, only those modes of the object molecule (PQ in this example) involving substantial motions of atoms also contained in the basis molecule (p-benzoquinone in this case) are well described by this method. Despite these obvious limitations, vibrational projection analysis has been extensively used to assign vibrational modes of substituted quinones and to compare modes of semiquinone radical anions with modes of the parent quinones.(59-61, 80) Vibrational projection analysis shows that the modes of p-benzoquinone assigned as primarily symmetric and antisymmetric CO stretching are spread over at least three different modes of PQ. For neutral PQ, the mode at a scaled calculated frequency of 1671 cm l is a stretching vibration mainly localized at the CO bond farthest from the isoprenoid chain. The 1665 cm ' PQ mode is an antisymmetric CO stretch involving both carbonyls and may be responsible for the experimentally observed shoulder on the peak assigned to PQ-9's CO stretch (PQ-9's CO stretch appears at 1650 cm" with a shoulder at 1635 cm~). The PQ vibration at 1643 cm ~ is a combination of symmetric C=C and CO stretching, while the 1607 cm ' vibration is an antisymmetric C=C stretch. Either of these modes could correspond to the CC stretch observed at 1620 cm ' for PQ-9. The primary difference in the calculated vibrations of PQ and PQ-is the relatively high frequency of the ring CC antisymmetric stretching mode. In PQ, a CO stretching vibration is calculated at 1600 cm", with a ring CC antisymmetric stretching mode appearing next, at 1501 cm". Slightly lower in frequency, at 1490 cm", is the CO antisymmetric stretch. Finally, the mixed ring CC/CO stretching mode is calculated at 1475 cm-'. Isotropic hyperfine coupling constants were also calculated for PQ and compared with experiment(82) (see Table 7). Proton hyperfine coupling constants (averaged for methyl and methylene protons) agree moderately well with experimentally measured values. To our knowledge, experimentally measured heavy atom hyperfine coupling constants are unavailable, but our results are comparable to hyperfine coupling constants estimated from B3LYP/EPR-II//PM3 calculations.(83, 84) Others have expanded on this work to investigate the conformational dependence of calculated methylene proton hyperfine coupling constants, in addition to hydrogen bonding effects.(85, 86) They find two minimum energy conformations for an ethyl chain model for the isoprenyl chain of P Q ~ a global minimum with the terminal carbon perpendicular to the ring and a second, local minimum energy structure with the terminal carbon atom in the ring plane. Eriksson and co-workers have studied the effects of hydrogen bonding to and protonation of PQ on hyperfine coupling.(85) They find that QEtH is an odd-alternant radical with strongly
673 Table 7 Experimental(82) and calculated(59) isotropic hyperfine coupling constants and spin densities for PQ, Hyperfine coupling constants were estimated using B3LYP/(632141)//6-31G(d) calculations. hfcc' s Spin density Expt. Calc. Expt. Calc. C1 -3.49 0.06 C2 +0.69 0.11 C3 -0.79 0.07 C4 -2.13 0.10 C5 +0.08 0.09 C6 -0.05 0.06 C7 -1.55 Av of H7 +1.76 +2.13 C8 - 1.16 -0.01 Av of H8 + 1.90 + 1.47 C9 - 1.25 -0.01 Av of H9 +2.45 +1.10 C10 +0.75 0.00 HlO -0.11 -0.10 Cll -0.01 0.00 Av of Hll +0.12 O1 -7.54 0.27 02 -7.20 0.25 H6 -2.05 -2.32 modified atomic spin densities and hyperfine structures compared with QEt. Although adding a second proton gives spin and hyperfine properties closer to QEt, these workers conclude that hyperfine properties of QEtH and QEtH2* are sufficiently different from those of QEt- to allow clear distinction between the three species. Their calculations relevant to hydrogen bonding were also interpreted to imply the presence of hydrogen bonding to both oxygen atoms of Q~ in PSII one in the plane of the quinone ring and one out-of-plane. A very careful comparison of calculated anisotropic hyperfine couplings for hydrogenbonded models of P Q (using the B3LYP/ESR-II~M3 method) with experimental ESR data in alcohol solvents suggests re-interpretation of the experimental spectra, supports the presence of hydrogen bonding to both oxygen atoms of PQ, and implies that in-plane hydrogen bonding can occur with O1, but substituents near 0 4 force its hydrogen-bond donors out of the ring plane.(84) Although these studies of hydrogen bonding to quinoidal radicals in solvent are potentially valuable, they must be extrapolated to protein environments with great care, because they do not establish whether the calculated spectroscopic properties are unique to the putative hydrogen bond geometry or the hydrogen bond donor group used in the calculation.
674
3.2
Menaquinones and their Semiquinone Radical Anions.
Although the alkyl substituents of plastoquinones apparently perturb the bonding in p-benzoquinone and its radical anion very little and therefore have only a small effect on structure, menaquinones differ from p-benzoquinone because they have a fused ring in addition to two alkyl substituents (see Figure 1). The combination substantially alters the structures and properties of menaquinones relative to those of p-benzoquinone. Table 8, for example, shows experimental and calculated bond distances for the quinoidal part of the headgroup for 1,4-naphthoquinone (NQ) and 2,3-dimethyl-l,4-naphthoquinone (23NQ), as well as calculated bond distances for a model of menaquinone-1 having methyl groups on the isoprenoid chain replaced by hydrogen atoms (MQ).(60, 61) First, all calculated bond distances for NQ differ from those determined by X-ray diffraction by an average absolute magnitude of 0.027 .~. It is worth noting that the X-ray diffraction structure is highly asymmetrical, implying that the ubiquitous crytal packing forces may have significantly altered the structure from that of the gas phase. CO bond distances calculated for NQ, 1.226/k are almost identical to those calculated for p-benzoquinone, 1.225 A. The calculated C(2)-C(3) bond distance of 1.344 A, in NQ is only 0.001 A longer than the corresponding bond in p-benzoquinone. Similarly, calculated C(1)C(2) and C(4)-C(10) distances of 1.485/~ are 0.002 ,~ longer than the C-C bond distances in p-benzoquinone. Bond distances near the fused ring of NQ differ more substantially from the corresponding bond lengths in pobenzoquinone. The calculated C(1)-C(9) and C(4)-C(10) distances of 1.496 A are 0.009 longer than the analogous bonds in p-benzoquinone. The C(9)-C(10) bond, and other CC bonds in the fused ring~(not shown in the table) display bond distances ranging from 1.386/k to 1.397 A and appear more characteristic of an aromatic ring. Next, published work showed that calculated bond distances for 23NQ Table 8 Selected bond distances (A) for naphthoquinone (NQ), 2,3-dimethyl-naphthoquinone (23NQ), determined from X-ray diffraction(87, 88) and B3LYP/6-31G(d) calculations.(60) Calculated bond distances for menaquinone (MQ) and its radical anion (MQ-).(60) NQ X-ray NQ 23NQ X-ray 23NQ MQ MQC 1=O 1.22 1.226 1.24 1.227 1.227 1.265 C4=O 1.21 1.226 1.23 1.227 1.229 1.266 C2=C3 1.31 1.344 1.34 1.358 1.358 1.391 C9=C10 1.39 1.409 1.40 1.404 1.404 1.419 C1-C2 1.48 1.485 1.47 1.495 1.498 1.453 C3-C4 1.45 1.485 1.45 1.495 1.493 1.448 C4-C10 1.46 1.492 1.49 1.489 1.488 1.470 C1-C9 1.43 1.492 1.45 1.489 1.489 1.472
675 have an average absolute difference of 0.015 A from X-ray diffraction bond distances so the calculated and experimental structures show better agreement than those for NQ. Bond distances calculated for 23NQ are very similar to those for NQ. The CO distances calculated for 23NQ are only 0.001 A longer (1.227 A) than the CO distances in NQ. The largest differences in bond distances are evident near the two methyl groups, as the C(2)-C(3) distance expands by 0.014 A, to 1.358 A, in 23NQ and both the C(1)-C(2) and C(3)-C(4) distances increase by 0.010 A, to 1.495 A. All other bond distances change by less than 0.007 * in 23NQ compared to NQ. Finally, Table 8 shows that bond distances in MQ are very similar to those in 23NQ, but MQ is slightly less symmetrical. One CO distance in MQ~ is identical to those in 23NQ (1.227 A), but one is very slightl)r longer (1.229 A). Only one other bond distance in MQ is as much as 0.003 A different from the corresponding distance in 23NQ. Consequently, fusing a ring to p-benzoquinone to form NQ causes major structural perturbations, adding two alkyl chains at C(2) and C(3) causes localized structural changes near C(2) and C(3), whereas the structures of MQ and 23NQ are nearly identical. Table 8 also shows that one-electron reduction of MQ to its radical anion, MQ-, causes structural changes concentrated in the quinoidal portion of the head group that are predictable based on what we have learned about other Pobenzoquinones. The CO and C(2)-C(3) bonds lengthen by 0.04 A and 0.03 A, respectively. The C(1)-C(2) and C(3)-C(4) bonds each shorten by approximately 0.05 A, while the C(1)-C(9) and C(4)-C(10) bonds shorten by almost 0.02 A. Although not all shown in the table, the C(9)-C(10), C(5)-C(10), C(8)-C(9), and C(6)-C(7) bonds all lengthen by about 0.01 A upon reduction, while the C(5)-C(6) and C(7)-C(8) bonds shorten by almost the same amount. These changes in bonding are consistent with the odd electron entering an orbital with a nodal structure similar to that described for p-benzoquinone (Figure 6),(62) implying that the fused ring and isoprenoid chain have little effect of the LUMO ofp-benzoquinone. Selected harmonic vibrational frequencies for MQ and M Q are displayed in Table 9.(60, 61) For the neutral MQ, the calculated antisymmetric and symmetric CO stretching modes appear at 1670 cm ~ and 1665 cm ~, respectively, but are really too close together to distinguish reliably. Not far below the CO stretching modes is a C(2)=C(3) stretch at 1610 cm ~ and two fused-ring C=C stretches at 1584 cm ~ and 1570 cm~. The scaled, calculated frequencies compare extremely well with the experimentally determined CO stretching modes measured at 1657 cm -~ and 1672 cm ~, the C(2)=C(3) band observed at 1621 cm", and the C-C fused-ring stretches at 1596 cm ' and 1582 cm".(89-91) All other experimentally measured vibrational bands listed in Table 9 are also extremely well reproduced by the calculations. In the radical anion MQ-, the CO stretches are shifted downward to 1501 cm ~ for the antisymmetfic stretch
676 Table 9 Approximate mode descriptions, experimentally measured vibrational frequencies(cm-1),(8991) and scaled calculated harmonic vibrational frequencies(cm-1)(60, 61) for selected normal modes of menaquinone(MQ) and its radical anion (MQ). Calculations were done with the B3LYP/6-31G9d) method. Frequencies greater than 1000 cmq were scaled by 0.9614 and modes less than 1000 c m -1 w e r e scaled by 1.0013. MQ MOAssignment Expt. Calc. Expt. Calc. C=O Antisymmetric Stretch 1657 1670 1505 1501 C=O Symmetric Stretch 1672 1665 1442 1484 C2-C3 Stretch 1618 1610 1605 1577 C-C Stretch 1596 1584 1539 1585 C-C Stretch 1583 1570 1510 C-H Bend 1459 1466 1432 H-C-H Bend 1438 1457 1458 H-C-H Bend 1378 1373 1345 C-C Stretch 1337 1328 1339 1325 C-C Stretch 1299 1269 1198 C-H Wag 969 974 980 C-H Wag 949 994 965 and 1484 cm-' for the symmetric stretch, placing them both below the predicted frequencies of the C=C stretches, which are calculated to appear at 1577 cm" for the C(2)-C(3) stretch, and 1585 cm" and 1510 cm" for the respective C=C aromatic modes. All of these modes, as well as the CC stretch calculated at 1325 -1 cm appear in the correct relative order compared with experiment and agree well with experimentally measured frequencies. Although calculated spin densities are perturbed very little for M Q compared to p-benzosemiquinone anion (compare Tables 10 and 5), calculated isotropic hyperfine coupling constants are altered substantially. Table 10 clearly shows that the magnitudes of calculated proton hyperfine coupling constants compare well with experimental values.(92) Comparing the hyperfine coupling constants of M Q with those of p-benzosemiquinone radical anion shows that the largest differences are found for the ring carbons. Apparently, the unpaired electron of M Q polarizes spin density at the nuclei differently than it does in pbenzosemiquinone radical anion, even though the spin density distributions in the two radicals are very similar. Recent analysis of anisotropic hyperfine couplings, calculated by using the B3LYP/EPR-II//PM3 method and including models for hydrogen bonding to carbonyl oxygens(93) shows spin redistribution onto the heavy atoms of the hydrogen bond donors and toward the carbonyl carbons of the model for MQ. Agreement between hyperfine couplings calculated for models incorporating hydrogen bonds to each oxygen atom and couplings measured in alcohol are reportedly excellent, whereas
677 Table 10 Calculated(60) and experimental(92) isotropic hyperfine coupling constants and spin densities for the menasemiquinone radical anion(MQ"). Calculations were done by using the B3LYP/(632141 )//6-31G(d) method. hfcc's Spin density Atom Expt. Calc. Calc. H5 0.30 -0.11 0.00 H6 0.74 -0.68 0.00 H7 0.74 -0.72 0.00 H8 0.26 -0.07 0.00 H 11 2.63 2.76 0.00 H12 1.18 1.18 0.00 H 13 -0.10 0.00 H 14 0.03 0.00 O1 -6.67 0.24 O4 -6.23 0.22 C1 -2.24 0.09 C2 1.10 0.11 C3 0.08 0.09 C4 -1.55 0.10 C5 - 1.03 0.00 C6 0.37 0.03 C7 0.45 0.03 C8 - 1.12 0.00 C9 -0.72 0.03 C10 -0.83 0.05 C11 -1.91 -0.01 C 12 - 1.46 -0.01 C13 2.17 0.01 C14 0.04 0.00 agreement with data from bacterial photosynthetic reaction centers is reportedly very good with an imidazole hydrogen bond only to 04. Thus for the structural models tested, very good agreement between calculated and experimental hyperfine couplings was obtained.
3.3
Ubiquinones and their Semiquinone Radical Anions.
Compared with p-benzoquinone, plastoquinones, and menaquinones, the structures of ubiquinones show several unique features related to their two adjacent methoxy groups (see Figure 9). Figure 9 compares the X - r a y diffraction structure of 2,3-dimethoxy-6-methyl-p-benzoquinone (UQ-0)(94) with calculated structures of UQ-0, UQ (a model of UQ-1 with isoprenoid methyl groups replaced by hydrogens), and UQ.(58) First, the maximum difference between experimental and calculated ring carbon-carbon bond distances is 0.037 A, and the calculated CO and ring carbon-carbon bond
678
O
O
1.225
1.42,3,,-ON / H3C" 1 " 3 3 ~ ~ 1.3~.,
O'1.46~
CH 3
'"
~ 1.242
O
CH 3 498
10.6o 1.221 1.43J,,,.O~ / ,,,~.496,,,,CH3 H 3C-r 1.341 1.5 8 Y 1.500
8
H
1.36)A,,~.~ O" "
~H
1.43~
CH 3
(a)
1.229
O (b)
O
H3C oO 1.421~ 123.2 1.269 O~ ~ ,N1.460/CH 3 (~H 3 1.383.~~,"1"., ;9"'~'~1.511 (~n 3 1.50~ II1 3'~ II 1.381 1.51q .... I1. . . . "1 . /I._ 1.512 II 1 51 t1._ 1.511 1.36~~ ~..~o~/,,,~, ,-_.~"~1.521 / / ~ C H O" 1.482" .... 34 .3 O / 1.45~ ~'.455 " ~ ......... '~'11.343 3 1.514 1.421k !1.271 1.510 1.435~ 1.231 CH 3 O CH3 O
1.222 9.7 ~ 1.43J,..O~ / CH3 H3C" 1.34.~1 1..'' 06 ]]1.352
1113
(c)
(d)
Figure 9. Bond distances from (a) the X-ray diffraction structure of 2,3-dimethoxy-6-methyl-
p-benzoquinone(94) and (b) B3LYP/6-31G(d) calculations for 2,3-dimethoxy-6-methyl-pbenzoquinone. Calculated bond distances and one methoxy torsional angle for (c) ubiquinone-1 and (d) the radical anion of ubiquinone-1.
distances differ from experimental values by an average absolute magnitude of 0.012 A. This average difference is less than the conventional criterion for determining whether or not two experimental bond distances are identical (3 times the standard deviation in bond distances is 3G = 0.021 A here), so the calculated CO and ring carbon-carbon distances of UQ-0 agree well with the Xray diffraction distances. For UQ-0, one carbon-oxygen torsinal angle places a methoxy carbon 10.6 ~ above the ring plane, compared with 3.2 ~ in the X-ray structure. The second methoxy group torsional angle of UQ-0 is calculated as 122.8 ~ whereas the experimental structure places this methoxy at-112.7 ~ below the plane in Figure 9. However, a second, local minimum energy conformation with this torsional angle a t - 1 2 3 . 6 ~ was calculated only 10.4 kcal/mole higher in
679 energy than the global minimum. An analogous X-ray diffraction structure, of 2,3-dimethoxy-5-prenyl-p-benzoquinone,(95) shows the corresponding rnethoxy group with a torsional angle of 120.2 ~ very similar to that calculated for UQ-0. The near coplanarity of only one methoxy substiment with the quinone ring is consistent with quantum chemical calculations of 2-methoxy- and 2,3dimethoxy-p-benzoquinone,(96) and the overall structure calculated by using the B3LYP/6-31G(d) method agrees well with published X-ray diffraction results. In addition, the calculated structures of UQ-0 and UQ agree well with each other. The largest effects on bond distances of adding an isoprenyl chain to UQ0 appear near the chain. Thus the C(4)-C(5) distance increases by 0.015/k and the C(5)-C(6) distance lenogthens by 0.009 A. Other bond length differences within the ring are 0.003 A or less. Conformations of the methoxy groups are also similar in the two molecules~torsional angles of UQ's methoxy groups are 9.7 ~ and 122.8 ~ compared to 10.6 ~ and 123.0 ~ for UQ-0. Like UQ-0, UQ displays a second low-energy conformation, a t - 1 2 1 . 0 ~ and 0.12 kcal/mole higher in energy than the global minimum energy conformation. Reducing UQ to UQ causes profound structural changes. First, CO bond distances increase by 0.047 A (to 1.269 A) and 0.040 A (to 1.271 /k). C=C bonds also expand significantly, to 1.381 /k and 1.378/k, and ring C-C bonds contract by varying amounts ranging from 0.025 /k to 0.048 A. As for the quinones described previously, bond length changes within UQ's head group are consistent with the added electron entering an orbital with the same nodal structure as p-benzoquinone's LUMO. Unique and intriguing features of ubiquinone reduction are evident, however, in the orientations of the methoxy substiments and their distances from the quinone ring of UQ. Ring carbonmethoxy oxygen bond lengths expand and become nearly equal (1.383 A and 1.382 A) upon reduction of UQ to UQ. The methoxy group's torsional angles also show a remarkable change. Although one methoxy CC-OC torsional angle increases by only 2.1 ~ upon reducing UQ, the second methoxy torsional angle changes from 9.7 ~ in UQ to 123.2 ~ in UQ. To our knowledge, this torsional angle change upon reduction had not been previously noted, although several workers had proposed that the conformation of a ubiquinone's methoxy groups could modulate its reduction potential.(94, 97, 98) Table 11 lists several scaled calculated vibrational frequencies of UQ and UQ-, along with experimentally measured frequencies.(81, 89, 90, 99-101) The table focuses on C=O and C=C stretching and methoxy torsional frequencies because these frequencies have been experimentally measured and are key to inferences concerning the influence of the protein on ubiquinone structure. For neutral UQ, the mode calculated at a scaled frequency of 1683 cm ~ is a stretching vibration concentrated at the C--O bond located meta to the isoprenoid chain and corresponds most closely to the recently observed UQ-1 C-O
680 Table 11" Approximate mode descriptions, experimental,(81, 89, 90, 99-101) and scaled, calculated,(58) vibrational frequencies (cm-1) for selected normal modes of a ubiquinone-1 (UQ) and ubisemiquinone-1 anion (UQ-) model. Calculations were done by using the B3LYP/6-31G(d) method and frequencies were scaled by 0.9614.(38)
Assignment
UQ Expt.
C 1-01 stretch
1664
1683
1486
1492
C4-02 stretch
1644
1649
1466
1482
1644
1617
1590
1527
1496
C2-C3/C5-C6 stretch (sym)
UOCalc.
Expt.
Calc.
C2-C3/C5-C6 str (antisym)
1614
1599
Methoxy bend
1288
1186
1180
Methoxy bend
1266
1180
1176
stretching mode at 1664 cm". The UQ mode calculated at 1649 cm" is a C=O stretch concentrated on the C=O bond ortho to the isoprenoid chain and agrees with the assignment for the band observed at 1644 cm ~ for UQ-1. In contrast to the C=O stretching modes, the ring C=C stretches mix to form symmetric and antisymmetric modes involving both C=C bonds. The UQ vibration calculated at a scaled frequency of 1644 cm" nearly overlaps one calculated C=O vibrational frequency, coincides with the experimentally measured frequency for a C - O band, and represents the symmetric C=C stretch. Although the lower frequency C - C stretch (calculated at 1599 cm ~) matches the experimentally assigned band at 1614 cm" well, experiments typically detect only one C=C stretching band. It was therefore proposed that the higher frequency, low intensity, symmetric C=C stretch is indistinguishable because it nearly overlaps the lower frequency, much more intense C=O stretching mode. The C-O-CH 3 bending modes were calculated at 1186 cm" and 1180 cm ~. The calculated frequencies are substantially lower than the experimentally observed C-O-CH3 bending modes at 1288 cm" and 1266 cm" for UQ-1 and appear substantially mixed, in contrast with the experimental inference that the higher frequency CO-CH3 bending mode corresponds to the methoxy group in the plane of the ring. We also note that calculated frequencies have been scaled down by almost 4% and, in this case, using a scaling factor specific for the bending modes might give better agreement with experiment.(66) Finally, the methoxy torsional modes appear at very low scaled frequencies, 97 cm" and 93 cm" (not shown in
681 the table). Each torsional mode is localized at one methoxy substituent, with the mode at 97 cm -~ involving torsions of the methoxy group para to the isoprenoid chain. Definitive answers to questions of local-mode mixing to form low frequency normal modes must await further work, however, because the exact extent of mode mixing is highly dependent on force constants used, force constants for such low frequency modes are small, and a small error in force constants would have a large effect on mode mixing. In UQ-, the C=C -1 symmetric stretching vibration is calculated at a scaled frequency of 1590 cm and the antisymmetric C=C stretch appears at 1506 cm ~. The calculated frequencies agree well with the experimentally measured frequencies of 1617 cm ~ and 1527 cm -~ for UQ-I. The antisymmetdc CO stretch is calculated at 1492 cm ~, followed by the symmetric CO stretching mode at 1482 cm ~. Although experiments imply that the CO stretching modes of U Q - I appear at 1486 cm ~ and 1466 cm t, both CO stretching modes were observed in the range from 1482 to 1500 cm ~ for UQ-0 and UQ-10. Since the CO stretching modes of U Q (calculated at 1492 cm ~ and 1482 cm ~) appear at nearly the same frequencies as several CH bending modes (calculated at 1493 cm ~ and 1475 cm~), it may prove necessary to perform deuterium isotopic substitution experiments for methoxy and/or methyl hydrogens to shift CH bending frequencies and thus pinpoint the CO stretching modes experimentally. Finally, the methoxy torsional vibrations are mixed and appear distributed over three modes, at 100, 99, and 80 cm t (not shown in Table 11). Calculations by Nonella and co-workers(96) for methoxy-p-benzoquinone and 2,3-dimethoxy-p-benzoquinone show that gross structural features are similar to those for UQ, but also show three different minimum energy conformations for the dimethoxy quinone. Nonella et al. also extended their vibrational analysis to include calculated intensifies and isotopic frequency shifts, map the energy surface for rotation of the methoxy groups and investigate the effects of methoxy conformations on vibrational frequencies. Although the conformational dependence of modes in the range from 1500-1700 cm ~ was rather small and made it difficult to discern the most predominant conformation in solution, Nonella and Br~.ndli find best agreement between calculated and measured vibrational frequencies for a conformation with one methoxy group in the ring plane and one methoxy twisted substantially out-of-plane. Their very detailed study further implies that structural distortions of ubiquinones by their protein environment may be modeled by using density functional calculations to infer molecular geometries from vibrational spectra. Similar calculations for the corresponding radical anion likewise showed three minimum energy conformations for the two methoxy groups, but implied a much weaker conformational dependence of molecular vibrations.(/02)
682 Isotropic hyperfine coupling constants have also been calculated for U Q and are s u m m a r i z e d in Table 12.(58) The table shows good a g r e e m e n t between Table 12 Experimental hyperfine coupling constants for ubisemiquinone radical anions(103-105) compared with calculated hyperfine coupling constants and spin densities(58) for ubisemiquinone- 1 radical anion (UQ1) from B3LYP/6-31G(d)//6-31G(d) and B3LYP/(632141)//6-31G(d) calculations. _
Spin Densities
Hyperfine Coupling Constants Atom Number
6-31G(d)
(632141)
C1 C2 C3 C4 C5 C6 C7 C8 C9 C 10 C 11 C 12 C 13 C 14 O1 O2 O3 04 Avg. C7 protons Avg. C8 protons Avg. C9 protons Avg. C 10 protons C 11 proton Avg. C 13 protons Avg. C 14 protons
+0.51 +0.99 +0.58 +1.30 + 1.45 +2.72 +0.24 +0.10 - 1.60 - 1.23 + 1.45 +0.17 -0.04 +0.10 -8.67 -8.06 -0.77 -0.65 -0.03 -0.03 +2.30 +0.95
-2.15 -0.46 -0.75 - 1.53 -0.27 +0.40 +0.46 +0.29 - 1.66 - 1.32 + 1.66 +0.10 -0.04 +0.13 -7.11 -6.80 -0.61 -0.56 -0.02 -0.02 +2.24 +0.95
-0.05 +0.03 +0.06
Expt.
6-31G(d)
(632141)
0.087 0.060 0.052 0.097 0.074 0.101 0.002 0.001 -0.011 -0.008 0.003 0.003 0.000 0.000 0.270 0.251 0.003 0.003 0.000 0.000 0.003 0.001
0.095 0.078 0.080 0.094 0.065 0.072 0.000 0.000 -0.006 -0.006 0.007 0.001 0.001 0.000 0.248 0.236 0.008 0.008 0.000 0.001 0.004 0.002
-0.08 +0.03
-0.001 0.000
0.000 0.000
+0.05
0.000
0.000
-0.93 0.65-0.8 1.5-1.6
1.51-1.71
2.09-2.20 1.04-1.06
683 calculated and experimentally determined values,(103-105) except for the carbonyl carbon C1. Additional computational work to incorporate hydrogen bonding to UQ-implies hydrogen bonding to the attached oxygen, O 1, resulting in increased spin density on C1 and a decrease in the magnitude of the ~3C1 hyperfine coupling.(106) These calculations are thus consistent with the proposed strong hydrogen bond to O 1 of QA in bacterial photosynthetic reaction centers. The isotropic hyperfine coupling of [3-protons was found in a different study to display the expected cosine dependence on ethyl side-chain conformations (a model for the isoprenyl chain conformations) and imply a structural distortion of the chain to a dihedral angle of 700 or 120 ~ for QA~ in the Rb. sphaeroides reaction center.(86)
4. SEMIQUINONE RADICAL ANIONS IN PLANT PHOTOSYSTEM II In photosystem II from green plants, the functions of tyrosyl radical intermediates in the reduction of the native plastoquinones remains uncertain.(107) Furthermore, deconvoluting the in vivo vibrational spectra of tyrosyl radicals from those of plastosemiquinone radical anions is very difficult.(80, 108) Recent HF/DF calculations provided a glimpse of the furore by showing what a powerful complement to experiment computational chemistry can be. In the work described next, HF/DF calculations were combined with vibrational spectroscopy and isotopic labeling to identify vibrational bands due to PQ-9 and PQ-9 in situ.(80) The in situ vibrational spectra were then correlated with spectral components observed in PSII. Table 13 summarizes a comparison of scaled, calculated vibrational frequencies and difference FTIR spectroscopic experiments. Calculated vibrational frequencies incorporated a water molecule at each end of PQ or P Q to model hydrogen bonding. Experiments were carried out for decyl-PQ in water, PQ-9 in water, and QA (the primary plastoquinone) in PSII, as well as their one-electron reduced forms. Comparing (scaled) calculated and experimental vibrational frequencies and intensities shows that the frequency observed near 1653 cm ' for decylPQ/PQ-9, and at 1659 cm -' for QA in PSII, corresponds most closely to the antisymmetric CO stretching mode calculated to appear at 1649 cm". The ring C=C/CO symmetric stretching mode calculated at 1632 cm ~ corresponds closely to the band observed at 1633 cm l for decyl-PQ/PQ-9 and at 1631 cm -' for QA" Finally, the ring C=C antisymmetric stretch calculated at 1604 cm ~ most probably matches the band seen near 1617 cm l for decyl-PQ/PQ-9 and 1611 cm 1 for Q~. Upon reducing PQ to its semiquinone anion, the CO antisymmetric stretching mode is calculated to shift down by 167 cm-', to 1482 cm ~. The same
684 Table 13 Approximate mode descriptions, experimentally measured vibrational frequencies(cm-1)(80) of decyl-PQ and decyl-PQ', and scaled calculated harmonic vibrational frequencies(cm1) for selected modes of plastoquinone-1 and plastosemiquinone-1 anion each contacted by two water molecules.(80) Calculations were done with the B3LYP/6-31G(d) method. Frequencies were scaled by multiplication by 0.9614.(38) _ PO Assignment
Expt.
PQ Calc.
Expt.
1664
Ring C=C/C=O Symmetric Stretch
Calc. 1602
1653
1649
Ring C=C/C=O Symmetric Stretch 1633
1632
1617
1604
1474
1498
1343
1418
1400
CO Antisymmetric Stretch
Ring CC Antisymmetric Stretch Ring C-H bend/C-C stretch
1454
1482 1467
mode appears near 1454 cm" in decyl-PQ/PQ-9 and 1469 cm" in Q~. Likewise, the ring CC antisymmetric stretching mode is calculated to shift to 1498 cm ~, close to that observed for decyl-PQ/PQ (1474/1471 cm l) and QA (1482 cm~). -1 Although the ring CC/CO symmetric stretch calculated to appear at 1467 cm was not observed, a ring CH bend/CC stretch calculated at 1400 cm ~ was detected near 1418 cm" for decyl-PQ and 1409 cm-' for PQ-9-. Calculated deuterium isotopic frequency shifts were also consistent with frequency shifts measured for decyl-PQ, PQ-9, and QA, as well as their radical anions. Thus, calculations proved a vital complement to experiment for assigning vibrational modes of PQ-9 and its anion, measured in situ. Despite their successes, however, these computational studies by themselves have not established the hydrogen bond geometry.
5. CONCLUSIONS AND F U T U R E D I R E C T I O N S Even where structures of quinone-protein complexes are available from X-ray diffraction experiments, the structures, side-chain conformations, and intermolecular contacts with proteins for the corresponding quinoidal radicals must usually be inferred indirectly from spectroscopic data. The primary spectroscopic methods used to infer structures of quinoidal radicals in photosynthetic reaction center proteins are designed to probe molecular vibrations and spin properties. Directly measurable quantities that are also
685 sensitive to intramolecular structure and non-covalent contacts such as vibrational frequencies, hyperfine coupling constants, and hyperfine tensor components are most useful for inferring structure. Careful quantum chemical studies have proven to play a critical role in relating spectroscopic measurements with structural features and those relevant to quinoidal radicals in the photosynthetic reaction center are summarized in this review. Although this review has also emphasized results obtained from DF and HF/DF methods, ab initio MO methods have an important place in work to model biochemical radicals such as those in the photosynthetic reaction center. A major drawback of DF-based methods, the lack of a systematic way to improve approximations for electron correlation, is not shared by ab initio MO methods. In the foreseeable future, ab initio MO methods will therefore retain their place as the method of choice for highly accurate results and for benchmarking more approximate ab initio and semi-empirical computational methods. In fact, comparisons of ab initio MO and density functional-based methods for calculating structures, vibrations, and spin properties of phenoxyl radical, pbenzoquinone, and p-benzosemiquinone radical anion provided the initial clues that DF and HF/DF methods were sufficiently accurate to infer structures by comparing calculated and experimental spectral properties. 5.1
Retrospective Initial comparisons of density functional-derived structures, vibrations, and spin properties for phenoxyl radical, p-benzoquinones, and pbenzosemiquinone radical anions with experimental data and with high-level ab initio MO calculations indicated the exciting potential of DF and HF/DF methods to complement experimental spectroscopic work. Subsequent DF and HF/DF studies of models for plastoquinones, menaquinones, ubiquinones, and their radical anions have been used to suggest the reinterpretation of some experiments and to confirm side-chain conformations suggested by experiments in condensed phases and in proteins. Calculations for more complex structural models have also shown which proposed hydrogen-bonding contacts between radicals and photosynthetic proteins are consistent with measured hyperfine couplings and vibrational frequency data. Despite these successes, improving experimental techniques challenge computational chemists to produce more accurate computational data and to make more far-reaching predictions. 5.2 Future Promise Constructing increasingly sophisticated structural models for radicals in proteins remains a major challenge. To develop truly predictive capabilities, computational chemists must work to incorporate more realistic models of intermolecular contacts and to model the effects of molecular dynamics on
686 average structures, spin properties, and molecular vibrations. Such dynamical studies will naturally lead to more accurate studies of radical reactions, including not only the multi-step reduction of quinones in the photosynthetic reaction center, but other biochemical electron transfer reactions and radical enzymatic reactions of the type reviewed elsewhere in this volume. As computational chemists strive to increase their predictive power, the demand for and development of new computational methods will also increase. Fast, accurate methods such as DF and HF/DF calculations will always have a place in modeling large biochemical systems and, indeed, recent advances in DF and HF/DF methods promise exciting progress in several areas. For example, DFbased quantum mechanical/molecular mechanics (QM/MM) methods(109) have not yet found extensive applications to study biochemical radicals. Meanwhile, newer, more accurate methods for calculating heavy atom hyperfine coupling constants, estimating hyperfine tensor components, and incorporating anharmonicity in vibrational frequency calculations are just a few of the many frontiers areas open for new, creative ideas. It is clear that chemists have only begun to apply the tools of modem computational chemistry to study radicals in proteins and the synergy between calculations and experiments is growing ever more important and rewarding.
REFERENCES .
2. 3.
o
o
.
7. 8. ,
10. 11.
L. Stryer. Biochemistry. W.H. Freeman & Co., New York, 1988. M. Y. Okamura and G. Feher, Annu. Rev. Biochem. 61 (1992) 861. R. A. Wheeler, in R. A. Wheeler (Ed.). Bioenergetics of Electron, Proton, and Energy Transfer. ACS Sympsium Series, Washington DC, 2000, Submitted. V. Barone, in D. P. Chong (Ed.). Recent Advances in Density Functional Methods. World Scientific, Singapore, 1995, p. 287. J. W. Gauld, L. A. Eriksson, and L. Radom, J. Phys. Chem. A 101 (1997) 1352 and references therein. A. N. Glazer and A. Melis, Annu. Rev. Plant Physiol. 38 (1987) 11. H. Michel and J. Deisenhofer, Biochemistry 27 (1988) 1. B. A. Diner, V. Petrouleas, and J. J. Wendoloski, Physiol. Plant. 81 (1991) 423. D. C. Youvan, E. J. Bylina, M. Alberti, H. Begusch, and J. E. Hearst, Cell (Cambridge Mass.) 37 (1984) 949. A. Trebst, Z. Naturforsch. 41C (1986) 240. J. Deisenhofer and H. Michel, EMBO J. 8 (1989) 2149.
687 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36.
G. Feher, J. P. Allen, M. Y. Okamura, and D. C. Rees, Nature 339 (1989) 111. J. P. Allen, G. Feher, T. O. Yeates, H. Komiya, and D. C. Rees, Proc. Natl. Acad. Sci. USA 84 (1987) 5730. J. P. Allen, G. Feher, T. O. Yeates, H. Komiya, and D. C. Rees, Proc. Natl. Acad. Sci. USA 85 (1988) 8487. U. Ermler, G. Fritzsch, S. K. Buchanan, and H. Michel, Structure 2 (1994) 925 and references therein. L. Szabo and N. S. Ostlund. Modem Quantum Chemistry, Introduction to Advanced Electronic Structure Theory. McGraw-Hill, New York, 1989. I. N. Levine. Quantum Chemistry. Prentice-Hall, Englewood Cliffs, NJ, 1991. J. Baker, A. Scheiner, and J. Andzelm, Chem. PHys. Lett. 216 (1993) 380. V. B. Luzhkov and A. S. Zyubin, J. Mol. Struc. (Theochem) 170 (1988) 33. H. Yu and J. D. Goddard, J. Mol. Struc. (Theochem) 233 (1991) 129. L. Johnston, N. Mathivanan, F. Negri, W. Siebrand, and F. Zerbetto, Can. J. Chem. 71 (1993) 1655. J. Takahashi, T. Momose, and T. Shida, Bull. Chem. Soc. Japan 67 (1994) 964. D. M. Chipman, R. Liu, X. Zhou, and P. Pulay, J. Chem. Phys. 100 (1994) 5023. Y. Qin and R. A. Wheeler, J. Chem. Phys. 102 (1995) 1689. Y. Qin and R. A. Wheeler, J. Am. Chem. Soc. 117 (1995) 6083. O. Nwobi, J. Higgins, X. Zhou, and R. Liu, Chem. Phys. Lett. 272 (1997) 155. R. Schnepf, A. Sokolowski, J. Mueller, V. Bachler, K. Wieghardt, and P. Hildebrandt, J. Am. Chem. Soc. 120 (1998) 2352. C. Adamo, R. Subra, A. Di Matteo, and V. Barone, J. Chem. Phys. 109 (1998) 10244. K. E. Wise, J. B. Pate, and R. A. Wheeler, J. Phys. Chem. B 103 (1999) 4764. C. T. Farrar, G. J. Gerfen, R. G. Griffin, D. A. Force, and R. D. B ritt, J. Phys. Chem. B 101 (1997) 6634. F. Himo, A. Graslund, and L. A. Eriksson, Biophys. J. 72 (1997) 1556. P. J. O'Malley and D. Ellson, Biochim. Biophys. Acta 1320 (1997) 65. S. E. Boesch and R. A. Wheeler, J. Phys. Chem. A (2000) In preparation. R. Liu and X. Zhou, J. Phys. Chem. 97 (1993) 9613. G. N. R. Tripathi and R. H. Schuler, J. Phys. Chem. 92 (1988) 5129. G. N. R. Tripathi and R. H. Schuler, J. Chem. Phys. 81 (1984) 113.
688 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52.
53. 54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65.
C. R. Johnson, M. Ludwig, and S. A. Asher, J. Am. Chem. Soc. 108 (1986) 905. A. P. Scott and L. Radom, J. Chem. Phys. 100 (1996) 16502. T. J. Stone and W. A. Waters, Proc. Chem. Soc. (1964). P. Neta and R. W. Fessenden, J. PHys. Chem. 78 (1974) 523. R. S. Mulliken, J. Chem. Phys. 23 (1952) 1833. V. Barone, C. Adamo, and R. Subra, J. Am. Chem. Soc. 117 (1995) 12618. N. Rega, M. Cossi, and V. Barone, J. Am. Chem. Soc. 119 (1997) 12962. N. Rega, M. Cossi, and V. Barone, J. Am. Chem. Soc. 120 (1998) 5723. F. Himo and L. A. Eriksson, J. Chem. Soc., Perkin Trans (1998) 305. S. E. Walden and R. A. Wheeler, J. Phys. Chem. 100 (1996) 1530. S. E. Walden and R. A. Wheeler, J. Chem. Soc., Perkin Trans. 2 (1996) 2653. S. E. Walden and R. A. Wheeler, J. Chem. Soc., Perkin Trans. 2 (1996) 2663. F. Himo and L. A. Eriksson, J. Phys. Chem. B 101 (1997) 9811. P. J. O'Malley and D. Ellson, Chem. Phys. Lett. 260 (1996) 492. G. Lassmann, L. A. Eriksson, and W. Lubitz, 103 (1999) 1283. V. G. Malkin, O. L. Malkina, L. A. Eriksson, and D. R. Salahub, in P. Politzer and J. M. Seminario Eds.). Theoretical and Computational Chemistry, Vol. 2: Modem Density Functional Theory, a Tool for Chemistry. Elsevier, Amsterdam, 1995, p. 273. Y. Qin and R. A. Wheeler, J. Phys. Chem. 100 (1996) 10554. S. E. Boesch and R. A. Wheeler, J. Phys. Chem. 99 (1995) 8125. S. E. Boesch, M.S. Thesis. Department of Chemistry and Biochemistry University of Oklahoma, Norman, 1996. S. E. Boesch and R. A. Wheeler, J. Phys. Chem. A 101 (1997) 8351. K. Hagen and K. Hedberg, J. Chem. Phys. 59 (1973) 158. S. E. Boesch and R. A. Wheeler, J. Phys. Chem. A 101 (1997) 5799. K. E. Wise, A. K. Grafton, and R. A. Wheeler, J. Phys. Chem. A 101 (1997) 1160. A. K. Grafton, S. E. Boesch, and R. A. Wheeler, J. Mol. Struc. (THEOCHEM) 392 (1997) 1. A. K. Grafton and R. A. Wheeler, J. Phys. Chem. A 101 (1997) 7154. R. A. Wheeler, J. Phys. Chem. 97 (1993) 1533. E. D. Becker, E. Charney, and T. Anno, J. Chem. Phys. 42 (1965) 942. K. Palmo, L.-O. Pietila, and B. Mannfors, J. Mol. Spectrosc. 100 (1983) 368. H. P. Trommsdorff, D. A. Wiersma, and H. R. Zelsmann, J. Chem. Phys. 82 (1985) 48.
689 66. 67. 68. 69. 70. 71. 72. 73. 74. 75. 76. 77. 78. 79. 80.
81. 82. 83. 84. 85. 86. 87. 88. 89. 90. 91. 92. 93. 94.
G. Rauhut and P. Pulay, J. Phys. Chem. 99 (1995) 3093. M. Nonella and P. Tavan, Chem. Phys. 199 (1995) 19. M. Nonella, Chem. Phys. Lett. 280 (1997) 91. G. N. R. Tripathi, J. Chem. Phys. 74 (1981) 6044. G. N. R. Tripathi and R. H. Schuler, J. Phys. Chem. 87 (1983) 3101. R. H. Schuler, G. N. R. Tripathi, M. F. Prebenda, and D. M. Chipman, J. Phys. Chem. 87 (1983) 5357. P. D. Sullivan, J. R. Bolton, and W. E. Geiger, J. Am. Chem. Soc. 92 (1970) 4176. B. S. Prabhananda, J. Chem. Phys. 79 (1983) 5752. P. J. O'Malley, Chem. Phys. Lett. 291 (1996) 367. P. J. O'Malley, J. Phys. Chem. A 101 (1997) 6334. P. J. O'Malley, Chem. Phys. Lett. 262 (1996) 797. P. J. O'Malley, J. Phys. Chem. A 101 (1997) 9813. A. K. Grafton and R. A. Wheeler, Comp. Phys. Comm. 113 (1998) 78. A. K. Grafton and R. A. Wheeler, J. Comput. Chem. 19 (1998) 1663. M. R. Razeghifard, S. Kim, J. S. Patzlaff, R. S. Hutchison, T. Krick, I. Ayala, J. J. Steenhuis, S. E. Boesch, R. A. Wheeler, and B. A. Barry, J. Phys. Chem. B 103 (1999) 9790. J.-R. Burie, A. Boussac, C. Boullais, G. Berger, T. Mattioli, C. Mioskowski, E. Nabedryk, and J. Breton, J. Phys. Chem. 99 (1995) 4059. F. MacMillan, F. Lendzian, G. Renger, and W. Lubitz, Biochemistry 34 (1995) 245. P. J. O'Malley and S. J. Collins, Chem. Phys. Lett. 259 (1996) 296. P. J. O'Malley, J. Am. Chem. Soc. 120 (1998) 5093. L. A. Eriksson, F. Himo, P. E. M. Siegbahn, and G. T. Babcock, J. Phys. Chem. A 101 (1997) 9496. F. Himo, G. T. Babcock, and L. A. Eriksson, J. Phys. Chem. A 103 (1999) 3745. J. Gaultier and C. Hauw, Acta Crystallogr. 18 (1965) 179. M. Brenton-Lacombe, Acta Crystallogr. 23 (1967) 1024. M. Bauscher and W. Mantele, J. Phys. Chem. 96 (1992) 11101. J. Breton, J. Burie, C. Berthomieu, G. Berger, and E. Nabedryk, Biochemistry 33 (1994) 4953. G. Balakrishnan, P. Mohandas, and S. Umapathy, J. Phys. Chem. 100 (1996) 16472. J. M. Fritch, S. V. Tatwawadi, and R. N. Adams, J. Phys. Chem. 71 (1967) 338. P. J. O'Malley, Biochim. Biophys. Acta 1411 (1999) 101. J. Silverman, I. Stam-Thole, and C. H. Stam, Acta Crystallogr. B 27 (1971) 1846.
690 95. 96. 97. 98. 99. 100. 101. 102. 103. 104.
105. 106. 107. 108. 109.
H. W. Schmalle, O. H. Jarchow, B. M. Hausen, and K.-H. Schulz, Acta Crystallogr. C 40 (1984) 1090. M. Nonella and C. Brandli, J. Phys. Chem. 100 (1996) 14549. R. C. Prince, P. L. Dutton, and J. M. Bruce, FEBS Lett. 160 (1983) 273. H. H. Robinson and S. D. Kahn, J. Am. Chem. Soc. 112 (1990) 4728. R. Brudler, H. J. M. de Groot, W. B. S. van Liemt, W. F. Steggerda, R. Esmeijer, P. Gast, and A. J. Hoff, EMBO J. 13 (1994) 5523. A. W. Parker, R. E. Hester, D. Phillips, and S. Umapathy, J. Chem. Soc., Faraday Trans. (1992) 2649. M. Bauscher, E. Nabedryk, K. Bagley, J. Breton, and W. Mantele, FEBS Lett. 261 (1990) 191. M. Nonella, J. Phys. Chem. B 102 (1998) 4217. T. N. Kropacheva, W. B. S. van Liemt, J. Raap, J. Lugtenburg, and A. J. Hoff, J. Phys. Chem. 100 (1996) 10433 and references therein. R. I. Samoilova, N. P. Gritsan, A. J. Hoff, W. B. S. van Liemt, J. Lugtenburg, A. P. Spoyalov, and Y. D. Tsvetkov, J. Chem. Soc. Perkin Trans. 2 (1995) 2063. A. P. Spoyalov and Y. D. Tsyetkov, J. Chem. Soc. Perkin Trans. 2 (1995) 2063. P. J. O'Malley, Chem. Phys. Lett. 285 (1998) 99. B. A. Barry, Photochem. Photobiol. 57 (1993) 179. C. Berthomieu, C. Boullais, J.-M. Neumann, and A. Boussac, B iochim. Biophys. Acta 1365 (1999) 112 and references therein. G. Monard and K. M. Merz Jr., Acc. Chem. Res. 32 (1999) 904 and references therein.
691
AUTHOR INDEX Adamo, Carlo (Ch. 12) Laboratory for the Structure and Dynamics of Molecules (LSDM) Dipartimento di Chimica Universit~t 'Frederico II' via Mezzocannone 4 1-80134 Napoli, Italy o
Aqvist, Johan (Ch. 7) Dept. of Cell and Molecular Biology Uppsala University, Biomed. Center Box 596 S-751 24 Uppsala, Sweden http://aqvist.bmc.uu.se Arthurs, Sandra (Ch. 8) Agouron Pharmaceuticals, Inc. 10777 Science Center Drive San Diego, CA 92121-1111, USA Barone, Vincenzo (Ch. 12) Laboratory for the Structure and Dynamics of Molecules (LSDM) Dipartimento di Chimica Universith 'Frederico II' via Mezzocannone 4 1-80134 Napoli, Italy Blomberg, Margareta R.A. (Ch. 3) Department of Physics Stockholm University Box 6730 S-113 85 Stockholm, Sweden Bouzida, Djamal (Ch. 8) Agouron Pharmaceuticals, Inc. 10777 Science Center Drive San Diego, CA 92121-1111, USA
Boyd, Russell J. (Ch. 11) Department of Chemistry Dalhousie University Halifax, Nova Scotia Canada B3H 4J3 Carloni, Paolo (Ch. 6) International School of Advanced Studies and INFM-Instituto Nazionale di Fisica della Materia 1-34014 Trieste, Italy and Intemational Centre for Genetic Engineering and Biotechnology 1-34012 Trieste, Italy Colson, Anthony B. (Ch. 8) Agouron Pharmaceuticals, Inc. 10777 Science Center Drive San Diego, CA 92121-1111, USA Cossi, Maurizio (Ch. 12) Laboratory for the Structure and Dynamics of Molecules (LSDM) Dipartimento di Chimica Universith 'Frederico II' via Mezzocannone 4 1-80134 Napoli, Italy Eriksson, Leif A. (Ch. 4 & 11) Department of Quantum Chemistry Uppsala University Box 518 S-751 20 Uppsala, Sweden http ://www .kvac. uu. se/eng/s taff/Leif Eriksson.html
692
Freer, StephanT. (Ch. 8) Agouron Pha17naceuticals, Inc. 10777 Science Center Drive San Diego, CA 92121-1111, USA Gehlhaar, Daniel K. (Ch. 8) Agouron Pharmaceuticals, Inc. 10777 Science Center Drive San Diego, CA 92121-1111, USA Henriques, E.S. (Ch. 13) CEQUP- Departamento de Qtdmica Faculdade de Cirncias da Universidade do Porto Rua do Campo Alegre, 687 4169-007 Porto, Portugal Higgs, Christopher (Ch. 9) Department of Biological Sciences Central Campus University of Essex Wivenhoe Park Colchester, Essex, CO4 3SQ, U.K. Himo, Fahmi (Ch. 4) Department of Physics Stockholm University Box 6730 S- 113 85 Stockholm, Sweden Karancsi-Menyh~d, D. (Ch. 2) Department of Theoretical Chemistry Lor~ind Ertvrs University P~izrn~ny P6ter st. 1A H- 1117 Budapest, Hungary Keserti, G. (Ch. 2) Department of Theoretical Chemistry Lor~ind E6tv6s University P~izmfiny P6ter st. 1A H- 1117 Budapest, Hungary
and Chemical and Biotechniological Research and Development Gedeon Richter Pharmacochemical Works P.O. Box 27 H- 1475 Budapest, Hungary Kolmodin, Kafin (Ch. 7) Department of Cell and Molecular Biology Uppsala University, Biomedical Center Box 596 S-751 24 Uppsala, Sweden Larson, Veda (Ch. 8) Agouron Pharmaceuticals, Inc. 10"777 Science Center Drive San Diego, CA 92121-1111, USA Luty, Brock A. (Ch. 8) Agouron Pharmaceuticals, Inc. 10777 Science Center Drive San Diego, CA 92121-1111, USA Luzhkov, Victor (Ch. 7) Department of Cell and Molecular Biology Uppsala University, Biomedical Center Box 596 S-751 24 Uppsala, Sweden Marrone, Tami (Ch. 8) Agouron Pharmaceuticals, Inc. 10777 Science Center Drive San Diego, CA 92121-1111, USA
693 Melo, A. (Ch. 13) CEQUP- Departamento de Quimica Faculdade de Cirncias da Universidade do Porto Rua do Campo Alegre, 687 4169-007 Porto, Portugal Mulholland, Adrian J. (Ch. 14) School of Chemistry University of Bristol Cantock' s Close Bristol BS8 1TS, United Kingdom http://www.bris.ac.uk/Depts/Chemist ry/staff/amulholl.htm N&ay-Szab6, Gabor (Ch. 2) Department of Theoretical Chemistry Lor~ind Ertvrs University P~izm~iny Prter st. 1A H- 1117 Budapest, Hungary http://theop8.chem.elte.hu/naray/ind ex.htm Olsson, Mats H.M. (Ch. 1) Department of Theoretical Chemistry Lund University, Chemical Center P.O. Box 124 S-221 00 Lund, Sweden Pastor, Nina (Ch. 10) Facultad de Ciencias U. Autrnoma del Estado de Morelos Av. Universidad 1001, Col. Chamilpa 62210 Cuernavaca, Morelos, Mexico Pierloot, Krisfine (Ch. 1) Department of Chemistry University of Leuven Celestijnenlaan 200F B-3001 Heverlee-Leuven, Belgium
Radom, Leo (Ch. 5) Research School of Chemistry Australian National University Canberra, ACT 0200, Australia http ://www.rsc. anu. e du. au/--radom/ Ramos, Maria J. (Ch. 13) CEQUP- Departamento de Quffnica Faculdade de Ci~ncias da Universidade do Porto Rua do Campo Alegre, 687 4169-007 Porto, Portugal Rega, Nadia (Ch. 12) Laboratory for the Structure and Dynamics of Molecules (LSDM) Dipartimento di Chimica Universit~t 'Fredefico II' via Mezzocannone 4 1-80134 Napoli, Italy Rejto, Paul A. (Ch. 8) Agouron Pharmaceuticals, Inc. 10777 Science Center Drive San Diego, CA 92121-1111, USA Reynolds, Christopher A. (Ch. 9) Department of Biological Sciences Central Campus University of Essex Wivenhoe Park Colchester, Essex, CO4 3SQ, U.K. http ://www. es sex. ac. uk/b c s/s taff/rey nc/ Rose, Peter W. (Ch. 8) Agouron Pharmaceuticals, Inc. 10777 Science Center Drive San Diego, CA 92121-1111, USA
694
Rothlisberger, Ursula (Ch. 6) Laboratory of Inorganic Chemistry ETH Zurich CH-8092 Zurich, Switzerland http ://www.rac.ethz.ch
Verkhivker, Gennady, M. (Ch. 8) Agouron Pharmaceuticals, Inc. 10777 Science Center Drive San Diego, CA 92121-1111, USA http://www.agouron.com
Ryde, Ulf (Ch. 1) Department of Theoretical Chemistry Lund University, Chemical Center P.O. Box 124 S-221 00 Lund, Sweden http ://signe. te okem. lu. se/--uff/
Weinstein, Harel (Ch. 10) Department of Physiology and Biophysics Mount Sinai School of Medicine One Gustave L. Levy Place New York, NY 10029, USA http://icb.mssm.edu
Schaffer, Lana (Ch. 8) Agouron Pharmaceuticals, Inc. 10777 Science Center Drive San Diego, CA 92121-1111, USA
Wetmore, Stacey D. (Ch. 5 & 11) Department of Chemistry Dalhousie University Halifax, Nova Scotia Canada B3H 4J3 and Research School of Chemistry Australian National University Canberra, ACT 200, Australia
Siegbahn, Per E.M. (Ch. 3) Department of Physics Stockholm University Box 6730 S-113 85 Stockholm, Sweden Smith, David M. (Ch. 5) Research School of Chemistry Australian National University Canberra, ACT 0200, Australia
Wheeler, RalphA. (Ch. 15) Department of Chemistry and Biochemistry University of Oklahoma 620 Parrington Oval, Room 208 Normal, OK 73019 USA
695 SUBJECT INDEX
2-methyleneglutarate mutase 191, 193 310-helix 346 Ab Initio molecular dynamics (AIMD) 215, 216, 218 Acceptance ratio 308 Accessible surface area 552 Active region methods 557 site 543,601,632 Activation barrier 241 energy 633 Addition-elimination mechanism 189, 194, 196, 199, 203 Adenosine triphosphate (ATP) 656 Adenylyl cyclase 366 Adiabatic connection method 477 AIDS 233, 544 Alcohol dehydrogenase Cu substituted 11 ~-carbon template 347 AM1 570, 577, 628 Aminomutases 185 Anharmonic effects 61 Anitbiotics 133 Asp dyad (HIV-1PR) 230 Atomization energies 473 ATP 658 Azurin 6 Co(II) substituted 9 B3LYP 97, 147, 192, 411,470, 477, 571,661 Back-bonding 69 Bacterial photosynthetic reaction centers 658 Bacteriorhodopsin (bR) 341,342, 346
Becke-Perdew86 12 Benchmark G2 and G3 98 [3-antagonists 346 Binding energy landscape 299, 319, 323 free energy 291,557, 561,565 interactions 604 Biomimetics 236 Blue copper proteins 1-56 Tetragonal structures 9 Trigonal structures 8 BLYP functional 473 Boltzmann distribution 307 constant 490 population 493, 518 Bond Cu-SMet 29, 47 Strength 147, 162 Born-Oppenheimer surface 217, 556 Bound flee-radical hypothesis 186, 187, 198 Butterfly conformation 416 C-terminus 350, 378 Calculations Ab initio 599 Monte Carlo 63 cAMP-dependent protein kinase 644 Captodative effect 160, 517 Car-Parrinello molecular dynamics 215-254 DNA fragments 219 Heme proteins 219 Ion channels 220 Nanotubes 219 Photosensitive proteins 220
696 Carbon-fixation 656 Carbon-skeleton mutases 183, 185, 205 CASSCF 661 -CASPT2 3 Catalytic cycles 621 effect 635 properties 597 triad 225 CBS-RAD(p) method 206 CCSD(T) 192, 193 Charge-dipole interactions 227, 232 Charge transfer Ligand-to-metal 18 Chorismate mutase 645 Citrate synthase 635 Claisen rearrangement 645 cobalamin 183, 184 Cobalt-carbon bond 185 Coenzyme A 165 B12 183-214 Complete Basis Set method (CBS) 192, 193 Complex di-iron (RNR) 170 enzyme-inhibitor 626 enzyme-ligand 570 ligand-protein 290 MetMb-NO 67, 69 Pre-initiation 400 Reactive 632 Receptor-ligand 552 Stabilization 398 TBP-DNA 380, 397,401 Water oxiding 100 Zinc, in HCAII 221 Computational co-crystallisation 555 Conformational analysis 509 changes 621
space 558 substates 290, 327 transition 396 conjugate peak refinement algorithm 619 connection atoms 613 Connolly surface 314, 498 Constitutive activation 359 Continuum electrostatic methods 559, 560, 566, 572 Cooperativity effects 293 Coordinate driving 619 Copper proteins Axial type 1 11 Rhombic type 1 11, 21 Type 1.5 17 Type 2 17, 23 Core core terms 609 correlation 25 Counterion condensation theory 381 Cross-link DNA-DNA 409 DNA-protein 409, 459 Cys-Tyr in GO 155 Crystal packing 421 Crystal structure predictions 317, 318 Cua site 1, 33, 37 Cuz site 37 Cucumber basic protein 15, 22 Cupredoxins 1 Cutoff, nonbonded 613,627 Cytochrome oxidase 32, 107 Binuclear center 110 Potential energy surface 117 Tyr-His cross linkage 110, 116 Cytochromes 1, 37 Cytoplasmic loop 350 Cytosine monohydrate 424 Dehydration 396
697
Density functional theory 71,469, 568, 571,576, 599, 612 Density ofstates 310 Deuterium isotope frequency shifts 684 Dielectric Effects 173 properties 45 diffusion-controlled processes 561 diradical pair 426 direct readout 382, 386 Disorders 62 Disproportionation reaction 129 Dissociation NO 78 rate 71 Distal side effects 67 DNA Anions 453 Bases, numbering 412 bending 387 cations 449 flexibility 390, 397 hydration 427, 443,445 irradiated 438, 455 long-range hole transfer 448 minicircles 393 multi-component model 409, 456, 459 oligomers 389, 390 radiation damage to 409-466, 513 radiation products 411 random/oriented fibers 442 strand breaks 430, 439, 444, 456 two-component model 439, 445,459 Domain swapping 363 Drug design, structure-based (SBDD) 540, 541,549 Dual specificity phosphatases (DSPases) 276
Cdc25A 276 Dynamics Activated 623 Brownian 356, 559, 560 Internal 62 Molecular (MD) 59, 256, 349, 387, 550, 554, 555, 621,624 Newtonian 63 Protein 623 Quantum 625 Side-chain 63, 401 Solvent 622 Effective core potentials (ECP) 97 Electric field 69 Electrophilic aromatic substitution 632 Electron affinities 440, 441 correlation 639 density 470 loss centers 438 transfer, in DNA 448, 458 Electronic spectra 17 Electrostatic Attraction 366 effect 70 hydration free energies 313 potential 362 Eliminases 185 Empirical valence bond (EVB) method 256, 575, 602 Energy-based methods 543 Enolase 640 Ensemble average 563, 564 Entatic state theory 2,6 Entropy effect, 02 cleavage 127 Enzyme -catalysed reactions 598 Enzyme inhibitor 598 binding 553 interactions 312 Enzymes 597
698 Equilibrium isotope effect 280, 283, 285 Euler-Lagrange equations 216, 217 Evolutionary algorithm 304, 305 Evolutionary trace method 362 Exchange-correlation energy 470 Farnesyl 115 Fast multipole method (FMM) 503 Fenton reaction 137 Flavin cofactor 631 Flavoprotein monooxygenases 531 Folding 65 Force field 570 AMBER 302, 388, 506, 520, 548, 576, 577 CHARMM 388, 544 DREIDING 302 Polarizable 606 Fragment linkage method 546, 547 Fragment SCF method 611 Fragmentation-recombination mechanism 194, 195, 199, 201 Free energy Activation 621,622 calculations 395 component analysis 565 derivatives method 566 of association 551 perturbation 11, 30, 256, 275 profile 620 Solvation 504 Frozen orbitals 574 Functional group 601 G2 methodologies 192, 193 G3(MP2)-RAD(p) 192,205 G~172 heterotrimer 361 GDP/GTP binding site 361 g-factors 146 G-protein coupled receptors 341376
Database (GPCRDB) 347 Dimerisation 363 Ligand binding domains 351 g-value calculations 156 Galactose oxidase (GO) 149, 236 Active site 150, 237 Catalytic mechanism 151,238 Energy surface 158 Gene expression, regulation of 400 Generalized gradient approximation (GGA) 471 mGGA functional 480, 481 generalized hybrid orbital method 611 GIAO 484 Glutamate mutase 187, 200 Glycolysis, anaerobic 95 Gonadotrophin 345 Hamiltonian, effective 607 Heme Peroxidases 118 pocket 78, 83 Hemoglobin 89 Herpes simplex virus type 1 thymidine kinase 234 High-affinity agonist-receptor-G protein complex 341 HIV-1 protease 229, 289-340, 544, 548, 639 Inhibitor binding dynamics 312 Inhibitor SB203386 296, 314 Mutants 295, 321 Two-step mechanism 297 HIV-1 reverse transcriptase 233 Homology modelling 342, 346, 541, 578 Human carbonic anhydrase II 221, 570 Hybrid techniques 605
Hydride ion removal 204 transfer 641,642
699 Hydrogen atom Adducts 423,524 Removal 425 transfer 112, 153, 162, 174 Hydrogen bond Effects of 415,422, 521 Low barrier (LBHB) 225, 228, 230, 234, 577, 636, 639 Short-strong (SSHB) 577, 636 Weak/strong 208, 636 Hydrogen peroxide activation 119 Hydrolysis Associative and dissociative pathways 281,284 Phosphate ester 258, 279 Phosphoenzyme 268 Hyperfine coupling constants (HFCC) 146, 174, 410, 414, 482, 494, 515, 527, 655,664, 665,669, 677, 682 IMOMM 151 Induced rack theory 2,6 Inhibitors 229, 293, 543,560, 562 Binding modes 328 Native binding mode 319 Reduced binding affinity 294 Selectivity 315 Interaction Core-core 610 Electrostatic 67, 541,608 Energy 617 Enzyme-ligand 539-596 Hydrophobic 541 Receptor-G-protein 359 Solute-solvent 496, 510, 520 Van der Waals 295,478, 541, 608 Intrinsic reaction path 491 Ionisation potential 440, 452 IR spectra 507, 508 Iron-sulphur clusters 1, 40
Rubredoxins 40 Ferredoxins 40 Isopenicillin N synhtase 133 Fe=O intermediate 137 Reaction mechanism 135 Isoprenoid chain 670, 674, 680 Jahn-Teller instability 9, 24 Kinetic Energy density 479, 481 Isotope effect 279, 283,285, 640 Ping-pong (PFL) 159 Krebs cycle 197 Lactate dehydrogenase 641 Lagrangian 216, 217 Z-dynamics approach 306, 567 Langevin-dipole model 559, 573, 576 Large amplitude path (LAP) 488, 490, 492 Leghemoglobin 88 Lennard-Jones parameters 616 Lexotropins 383 Lieb-Oxford limit 472, 474, 475 Ligand Axial 31,38 Assembly 545 Binding 351 Differentiation 66 Induced conformational change 278 Migration 86 Models 24, 119 Radical 137, 138 Ligand-protein Binding dynamics 298 Binding energetics 292
700 -
Docking simulations 304, 305, 314 Docking techniques 300 Thermodynamics 306 Linear interaction energy (LIE) method 269, 566 Link atoms 545, 574, 611, 612 Lipid bilayer 358 Lysozyme 42, 599 Local reaction field method 263 Local self-consistent field (LSCF) method 611 Low-energy basins 322 Low molecular weight PTPase 263 Concerted pathway 265 D 129A mutant 271 Energy profile 264, 266, 267, 269 Malate dehydrogenase 641 Manganese catalase 128 Binuclear manganese cluster 129 Energy diagram 133 Ferromagnetic coupling 130 Reaction mechanism 130 Melanocortin 343 Menaquinone 658, 666, 674, 675 Methane monooxygenase 121 Antiferromagnetic state 125 Bridging carboxylates 125 Compound Q 122 Ferromagnetic states 124, 125 Proposed mechanism 126 Methylmalonyl-CoA mutase 197 Metropolis method 307, 556 Michaelis complex 271 Minor groove 383,397, 399 Molecular docking 298, 548, 578 modelling 542 mechanics (MM) 553, 599 recognition 289
Moller-Plesset calculations 440 Monohydrate crystals 424, 428, 429, 450 Monte Carlo simulations 289-340, 346, 555 Simulated annealing 299, 343, 549, 550 Motion Rigid body 358 Thermal 60 MNDO 601,628 MNDO/H 577 Mn-O cluster 103 Multiple histogram equations 311 Myoglobin 57-94 N-terminus 365, 378 NADH 121 NADPH 658 Naphtoquinone 674 Native binding domain 316 Neuraminidase 644 Nitrite reductase 15, 17, 21 Nitrous oxide reductase 33 NMR shieldings 484, 486 Normal modes 488, 489 Nucleotide binding domain 360 ONIOM method 498, 504, 606 02 Activation 107, 118, 121,128, 133, 571 Bond strength 96 Formation and cleavage 95144 Out-of-plane rotation 418, 517 oxidative damage 128 Oxygen transport 57, 66 Oxyl radical mechanism 104 Packing
701 Angle 348 Helix-helix 364 Ridges-in-grooves 348 Papain 643 Para-benzoquinone 665, 668, 685 Para-hydroxybenzoate hydroxylase (PHBH) 631 Parametrization 606 Partial-proton-transfer 206 Partial rational function optimization 618 Partitioning schemes 610 Peptide receptors 353 Perdew - Burke - Ernzerhof (PBE) functional 472, 474, 481 PBE0 hybrid functional 478, 479, 481,483, 508, 530 revPBE, RPBE functionals 475,476, 481 Perdew-Wang- Perdew 86 functional (PWP86) 411 Pharmaceutical lead compound 598 Pharmacophores 228 Phenol hydroxylase 635 Phosphate binding loop (P-loop) 254, 267 Phosphorylation 234, 253 Photodissociation 73, 75 Photosynthesis 655 ~-spin density 431 Piecewise linear energy function (PL) 303 pKa calculations 505 Plane wave basis set 218 Plant photosystem II (PSII) 657, 683 Plastocyanin 6, 18 Plastoquinone 657, 666, 670, 673 PM3 628, 672 P-O bond cleavage 265 Point-charge 4, 607, 608 Poisson-Boltzmann 559 equations 5
linearized, method 70 Polarizable continuum model (PCM) 4, 468,496, 498,576 CPCM 499, 509, 530 DPCM 128, 499, 518, 529 IEFPCM 499 Polarization 606, 616 Population analysis 469 Porphine 38 Potential energy surface 30, 553, 619, 642 Potential of mean force 5, 309, 311, 395, 565, 620 Probability Density 311 distribution 310 projection map 345, 347 promoter sequences 388 protein conformations 320, 322, 324 Protein data bank (PDB) 301,380, 544 Protein-dipole Langevin-dipole method 5 Protein-solvent interface 64 Protein strain 33, 42 Protein tyrosine phosphatases (PTPases) 253-288 Active site 254 C17S mutant (PTP1B) 272, 274 Reaction mechanism 255 Proton affinity 223 gradient 656 hopping 226 transfer 152, 224, 273, 451, 454, 641,642 Protonation Effect of 196, 199, 204 states 627, 637 Pseudoazurin 22 Pseudobond method 613
702 Pyruvate formate-lyase (PFL) 158, 571 Oxidative degradation 167 Proposed mechanism 161, 166 QC/MM 5 QCISD(T) 192, 193 QM 568 QM/CD 569 QM/CM 540, 569, 572, 578 QM-FE 577 QM/MM 221,504, 523,569, 574, 577, 597-654 Ab initio 608, 615, 628, 635, 637 classification 605 coupling 605 DFr based 628, 686 implementations 614 interaction energy 608 LSCF method 575, 576 Semiempirical 609, 611,628 Quantitative structure-activity relationships (QSAR) 235, 539, 551,633 Quinones 655 Quinone-protein interactions 671 Radical 1,2-shift 188 3-propanal 200 5'-deoxyadenosyl 186 Allyl 483 Aminopropyl 201 anion, semiquinone 665, 674, 677, 683 Deoxyribose 413,452 DNA base 411 Enzymes 145-181 Fluoromethyl 494 Glycyl 148, 159, 514, 519, 523 H2CO§ 483 Iminoporopyl 202
Ketyl intermediate 155 Methyl 482, 495 Methylcyclopropane cation 197 Hydroxyl, OH 399, 411,426, 427, 446, 458 Imidoxyl 529 NO-centered 514 p-Cresyl 660 Phenoxyl 660, 663, 685 Phosphate centered 445,456 Planar vs puckered 420, 422, 495 Primary, in DNA 438, 447 Protonated 420, 421 Proxyl 529 Puckering modes 430 PtnSne 413,417, 514, 524 Pyrimidine 413,417, 514, 524 Quinoidal 655-690 Rearrangement mechanism (B12) 188 Ring breaking 436 Secondary 441,447, 449, 453, 458 Sugar 429, 437,443 Sulfinyl 168 Thymyl 524 Transfer pathway (RNR) 170, 173 Tyrosyl 660 Tyr122 in RNR 170 Tyr272 in GO 149, 157, 239 Tyrz in PSII 100 Radiation Damage 428, 457, 513 products 413,429 Raman spectra 507, 508 Ramachandran map 506 trajectory plot 278 rate-limiting step 634, 636 Reaction
703 coordinate 620, 624, 634 enthalpy 191 free energies 257 mechanisms 630 pathway 618,619, 638 Receptor 543 5-HT 357 Agonist/antagonist bound 357 Amine 352 ~lB-adrenergic 357, 358 ~2-adrenergic 342, 349, 352, 360, 363 Bradykinin 355 Cannabinoid 354 Chimeric 363 Dopamine Dz 342 Tachykinin 355 Vasopressin 354 Recognition anchors 301 Recombination 77, 79 Reductases 185 Reduction potential 26, 28 Regulatory mechanism of binding 291 Relaxation 76 Renin (aspartyl protease) 543 Reorganisation energy 26, 35, 37 Inner-sphere 26 Outer-sphere 26 Rhodopsin 345,346, 349 Ribonucleotide reductase (RNR) 169, 571 Energy surface 177 Proposed mechanism 171, 175 Rieske iron-sulphur site 42 Ring-opening 190, 435 Ring puckering, sugar radical 431 Rotations Methoxy groups 679 Methyl groups 419 Rotational barrier 434 Hamiltonian 492
Scoring functions 550, 551 Self interaction correction 480 Semiempirical methods 568, 572, 576, 599, 609 Sequence specific recognition 382, 400, 401 Serine proteases 225, 547 Serotonin 343, 346 Sequence Alignment 344 Motifs 344 Simulated annealing 554 Single crystals 410, 447, 450, 455 Singular value decomposition 65 SIV protease 296 Software Amber 4 COMQUM 5, 14 deMon 411 Gaussian 94 and 98 3, 97, 280, 411,481,571 MEAD 5 MOLCAS 4 Mulliken 3 Q 262 Turbomole 3 Solvation 557, 572 effects 13 energy 31 model SCI-PCM 98, 498 SCRF model 283 Spin Contamination 659 Density 415, 416, 662, 665, 669, 676, 682 Labelling 356, 520 population 114, 154, 157, 163, 172 unpaired 145, 239 Spin state change 137 SteUacyanin 6, 24 Stereoselectivity 559
704 Strain 43 energy 16, 39 Substrate 645 structure-based thermodynamics method 292 Substrate trapping (PTPase) 274 Surrounding, effect of 223,467-538 x-functional 480 TATA Box binding protein (TBP) 377-408 Recognition sequence 377, 380 TBP associated factors (TFA) 378 binding to DNA 379, 381 directionality of binding 393 DNA interface 385, 386, 396 Dynamics 398, 399 sequence alignment 384 tessarae 498, 502 Tetrahedral intermediate (PFL) 161, 163 Thermodynamic Cycle 561,562 Integration 562, 563 Perturbation 562, 564 Thioester exchange (PFL) 164 Torsional motion 511, 681 Transcription 377 Transducin 360, 365 Transition state 598, 601,635, 643 Rate determining (GO) 153 stabilization 645 structure 618, 637 theory 622 Triosephosphate isomerase 603 Tunnelling 625,640 Ubiquinone 658, 666, 677 Umbrella sampling 622, 640, 644 Uniform electron gas 471
Valence bond states, PTPases 261 van der Waals parameters 617, 630 Vibrational averaging 488, 494, 520, 524, 527 frequencies 662, 667, 668, 676, 680 modes 663, 675 projectional analysis 671,672 Water, irradiation of 445 oxidation of 453 TIP3P model 615 Watson-Crick hydrogen bonds 379 Weighted histogram analysis method 309 Wrinkled-DNA conformations 389 Zwitterion ~-alanine 509, 510 glycyl radical 514, 515