This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
value, with the POL+ result being somewhat larger. The source of these underestimations can be found by looking at the individual components. The,,p, (p,,,,,) components are very different from those derived from the Dunning augmented basis sets, and the results from using the Sadlej basis even have a different sign. Some other basis sets in Table 4 are known to be poor, and this again can be seen by looking at the individual components. The specialty 6-31 +PD basis set, for example, gives a reasonable
value, but the ,p,, component has the wrong sign.
Practical Considerations 271
30
I
Sadlejw
ot 0
1
2 Level of Augmentation
. 3
Figure 7 First hyperpolarizability (p) of HCN as a function of basis set.
To illustrate the size of typical NLO basis sets in use, we show the number of Gaussian functions for several of the common basis sets for two molecules: acetylene and p-nitroaniline (Table 5). Most MO programs in use are capable of implementing 5 d functions and 7 f functions, except for which uses 6 d functions and 10 f functions. The difficulty of predicting accurate NLO properties is clearly apparent when calculations on the fairly small molecule nitroaniline could require around 1000 basis functions to overcome basis set deficiencies. From the very limited results presented here, it is impossible to declare that one particular basis set is the best. The decision of which basis set to implement is often based on a compromise between quality and time or cost. It is important to note that the exact type of basis set needed is dependent on the nature of the system being studied, and basis sets as large as those mentioned here may not always be needed to provide accurate results. For very delocalized T systems in large polyenes, for example, it has been shown even the smaller, “normal” basis sets work ~ e 1 1 . ~ ~ 3 ~ ~
Other Considerations Semiempirical molecular orbital methods have long been a mainstay for the prediction of NLO properties, particularly with sum-over-states methods, and much of our understanding of how molecular properties affect NLO prop-
272 Predicting Nonlinear ODtical ProDerties Table 5. Basis Set Sizes for (6d710f)l(5d77f) Implementations Basis Functions Acetylene C, N, O/H Basis Set C2H2 3s2p 1dl2s 6-31G" 34132 6-31G+PD 3s3pldl2s 40138 Sadlej Ss3p2d13s2p 70166 POL+ 5s3p2d13s2pld 82176 aug-cc-pVDZ 4s3p2d13s2p 6 8/64 5s4p3d14s3p d-aug-cc-pVDZ 90186 t-aug-cc-pVDZ 6sSp4dlSs4p 1241116 Ss4p3d2f14s3p2d 1561138 aug-cc-pVTZ 21411 88 d-aug-cc-pVTZ 6sSp4d3f15s4p3d 7~6pSd4W6s5p4d t-aug-cc-pVTZ 2721238
p-Nitroaniline N02C,H,NH, 1621152 1921182 3141294 3501324 3041284 4281398 55315 13 6801598 9301812 11 8011026
erties is deduced from them.5>80However, in most instances the failure of semiempirical methods to provide reasonable NLO properties for small molecular systems has been noted.2,8' The primary cause of the problems in these systems is traced to the minimal basis sets used in semiempirical methods. As noted earlier, smaller basis sets may be adequate for many larger molecular systems, such as polyenes, and for these systems one expects that semiempirical methods might suffice.82 The role of electron correlation in the prediction of NLO properties has ~ ~ , ~ ~ ~ ~as~albeen shown to be very important in several s t ~ d i e s .Unfortunately, ready noted, most currently employed methods do not give correlated frequency-dependent properties, and most studies of electron correlation have been done using finite field approaches. The magnitude of the correlation correction varies widely. For example, for large polyenes like CZ4Hz8,it has been shown that the MP2 y may be twice the RHF value.85 At present, the only hope for routinely treating large systems is to hope that the essential trends and relationships are adequately explained at the RHF level even though the values may not be accurate. As an alternative to Hartree-Fock semiempirical and ab initio calculations, density functional theory has been used to obtain nonlinear optical properties in both the finite field86,87 and TDHF88,89 (or time-dependent Kohn-Sham) approaches.
BEYOND MOLECULAR ELECTRONIC CALCULATIONS Ultimately the computational chemist wants to make the model theories as realistic as possible, and the next step in the computation of NLO properties is not just to obtain more accurate and more efficient methods for the elec-
Beyond Molecular Electronic Calculations 273 tronic properties of isolated (0 K ) molecules, but to enlarge the theory to treat realistic NLO systems which are dynamic, interacting molecules. We now consider aspects of computing NLO properties more closely related to experimentally measured properties.
Molecular Vibrational Calculations The methods discussed above are, in general, concerned only with obtaining the electronic contribution to polarizabilities and hyperpolarizabilities. A complete treatment of the problem requires inclusion of the vibrational and rotational contributions as well. For many experiments at visible frequencies, these effects may be small. For low frequency or static field experiments, however, these effects have been shown to be as large or larger than the electronic effects t h e m s e l v e ~Bishop . ~ ~ and Kirtman developed a general approach for calculating the vibrational contributions for polyatomic system^.^^,^^ A recent overview of this subject can be found in the review by Kirtman and Champagne.93
Condensed Phase Problems The methods discussed so far apply to single molecules only. The data derived from those calculations should be comparable to those from experiments done at low pressure on pure gases. However, most interest in NLO properties are in condensed phase systems (liquids, polymer films, or crystals). A major area of theoretical interest has been on solvent effects, and several techniques have been applied to the calculation of NLO proper tie^.^^-^' The most common (and simplest) method is the reaction field model, where the solute molecule is in a cavity of solvent, which is treated as a uniform dielectric medium. Cavity approaches are problematic. How do you pick the cavity size? How do you pick the cavity shape? How do you model stronger, specific interactions (such as hydrogen bonding)?The work of Willetts and Rice94 illustrated the inability of reaction field models to adequately treat solvent effects even though they tried both spherical and ellipsoidal cavities. Mikkelsen et al.96 attempted to provide specific interactions with their solvent model by explicitly including solvent molecules inside the cavity. These and related issues need to be addressed further if computational chemists are to develop truly useful procedures capable of including solvent effects in NLO calculations. Recent work by Cammi, Tomasi, and c o - w o r k e r ~ has ~ ~ attempted - ~ ~ ~ to address these issues within the polarized continuum model (PCM)and have included studies of frequency-dependent hyperpolarizabilities. Another way to treat condensed phases is to explicitly study intermolecular interaction effects on NLO properties. Interesting attenuation effects in NLO properties arising from interchain interactions have been shown to exist for interacting ethylene molecules'"' and for butadiene and hexatriene molecules held in the alignment corresponding to polyacetylene stretched
274 Predicting Nonlinear Optical Properties
fibersg5 Another example is found in variable position studies of interacting H(C,H,)nH molecules (with n from 1 to 6).'02 More work is needed in this area of treating environmental effects before robust NLO property predictions can be made.
SUMMARY The implementation and limitations of three common methods for obtaining NLO properties (finite field, sum over states, and time-dependent Hartree-Fock) have been discussed, and very brief introduction has been made to the new methods under development. The goal of obtaining results that are directly related to the common experimental results is improving but still has a long way to go. Theoretical and computational NLO work will clearly continue to be important.
ACKNOWLEDGMENTS During the writing of this chapter, we benefited from many useful comments by Bernard Kirtman (University of California, Santa Barbara), Shasi Karna (U.S. Air Force Phillips Lab), Tom Cundari (University of Memphis), and the editors of this series.
~
~~
REFERENCES 1. D. M. Bishop, in Advances in Quantum Chemistry, J. R. Sabin and M . C. Zerner, Eds., Academic Press, San Diego, CA, 1994, Vol. 25, pp. 3-48. Aspects of Non-Linear-Optical Calculations. 2. D. P. Shelton and J. E. Rice, Chem. Rev., 94, 3 (1994).Measurements and Calculations of the Hyperpolarizabilities of Atoms and Small Molecules in the Gas Phase. 3. M. Ratner, Int. J. Quantum Chem., 43, 5 ( 1 992). Electronic Structure Studies of Nonlinear Optical Response in Molecules: An Introduction. (Plus rest of issue.) 4. J. L. BrCdas, C. Adant, P.Tackx, A. Persoons, and B. M . Pierce, Chem. Rev., 94,243 (1994). Third-Order Nonlinear Optical Response in Organic Materials: Theoretical and Experimental Aspects. 5 . D. R. Kanis, M . A. Ratner, and T. J. Marks, Chem. Rev., 94, 195 (1994).Design and Construction of Molecular Assemblies with Large Second-Order Optical Nonlinearities. Quantum Chemical Aspects. 6. G. D. Stucky, S. R. Marder, and J. E. Sohn, in Materials for Nonlinear Optics-Chemical Perspectives, S. R. Marder, J. E. Sohn, and G. D. Stucky, Eds., ACS Symposium Series 455, American Chemical Society, Washington DC, 1991, pp. 2-30. Linear and Nonlinear Polarizability: A Primer.
References 27.5 7. C. E. Dykstra, S.-Y. Liu, and D. J. Malik, in Advances in Chemical Physics, I. Prigogine and S. A. Rice, Eds., Wiley, New York, 1989, Vol. 75, pp. 37-112. Ab Initio Determination of Molecular Electrical Properties. 8. R. W. Boyd, Nonlinear Optics, Academic Press, San Diego, CA, 1992. 9. H. Margenau and G. M. Murphy, The Mathematics ofPbysics and Chemistry, Van Nostrand, Princeton, NJ, 1956. 10. B. A. Reinhardt, Trends Polym. Sci., 4, 287 (1996). Third-Order Nonlinear Optical Polymers. 11. B. A. Reinhardt, Trends Polym. Sci., 1 , 4 (1993).The Status of Third-Order Polymeric Nonlinear Optical Materials. 12. B. A. Reinhardt, in Encyclopedia of Advanced Materials, D. Bloor, R. J. Brook, M. C. Flemings, S. Mahajan, and W. Cahn, Eds., Pergamon Press, Oxford, 1994, pp. 1784-1793. Non) linear Optical Materials: x ( ~ Polymers. 13. R. Dagani, Chem. Eng. News, Sept. 23, 1996, pp. 68-70. Two Photons Shine in 3-D Data Storage. 14. A. Buckley, Adv. Muter., 4, 153 (1992). Polymers for Nonlinear Optics. 15. T. Kaino and S. Tomaru, Adv. Muter., 5,172 (1993).Organic Materials for Nonlinear Optics. 16. D. F. Eaton, G. R. Meredith, and J. S. Miller, Adv. Muter., 3, 564 (1991).Molecular Nonlinear Optical Materials-Potential Applications. 17. R. Dorn, D. Baums, P. Kersten, and R. Regener, Adv. Muter., 4, 464 (1992). Nonlinear Optical Materials for Integrated Optics: Telecommunications and Sensors. 18. P. N. Prasad and D. J. Williams, Introduction to Nonlinear Optical Effects in Molecules and Polymers, Wiley, New York, 1991. 19. J. Zyss, Ed., Molecular Nonlinear Optics, Academic Press, New York, 1994. 20. G. J. Ashwell and D. Bloor, Eds., Organic Materials for Non-Linear Optics, Vol. 111, Royal Society of Chemistry, Cambridge, 1993. 21. G . A. Lindsay and K. D. Singer, Eds., Polymers for Second-Order Nonlinear Optics, ACS Symposium Series 601, American Chemical Society, Washington, DC, 1995. 22. B. E. A. Saleh and M. C. Teich, Fundamentals ofPhotonics, Wiley-Interscience, New York, 1991. 23. A. Marrakchi, Photonic Switching and Interconnects, Dekker, New York, 1994. 24. A. D. Buckingham, Adv. Chem. Phys., 12,107-142 (1 967). Permanent and Induced Molecular Moments and Long-Range Intermolecular Forces. 25. A. Willetts, J. E. Rice, D. M. Burland, and D. P. Shelton, J. Chem. Phys., 97, 7590 (1992). Problems in the Comparison of Theoretical and Experimental Hyperpolarizabilities. 26. P. W. Atkins and R. S. Friedman, Molecular Quantum Mechanics, 3rd ed., Oxford University Press, Oxford, 1997. 27. D. A. Kleinman, Phys. Rev., 126, 1977 (1962). Nonlinear Dielectric Polarization in Optical Media. 28. D. J. Williams, in Materials for Nonlinear Optics-Chemical Perspectives, S. R. Marder, J. E. Sohn, and G. D. Stucky, Eds., ACS Symposium Series 455, American Chemical Society, Washington, DC, 1991, pp. 31-49. Second-Order Nonlinear Optical Processes in Molecules and Solids. 29. H. D. Cohen and C. C. J. Roothaan, 1. Chem. Phys., S43, 34 (1965). Electric Dipole Polares of Atoms by the Hartree-Fock Method. I. Theory for Closed-Shell Systems. 30. R. J. Bartlett and G . D. Purvis 111, Phys. Rev. A , 20, 1313 (1979).Molecular Hyperpolarizabilities. I. Theoretical Calculations Including Correlation. 31. H. A. Kurtz, J. J. P. Stewart, and K. M. Dieter, J. Comput. Chem., 11, 82 (1990). Calculations of the Nonlinear Optical Properties of Molecules.
276 Predicting Nonlinear Optical Properties 32. F. Sim, S. Chin, M. Dupuis, and J. E. Rice, J. Phys. Chem., 97, 1158 (1993).Electron Correlation Effects in Hyperpolarizabilities of p-Nitroaniline. 33. D. E. Woon and T. H. Dunning, Jr., J . Chem. Phys., 100,2975 (1994). Gaussian Basis Sets for Use in Correlated Molecular Calculations. IV. Calculation of Static Electrical Response Properties. 34. G. Maroulis and A. J. Thakkar, j . Chem. Phys., 93,4164 (1990). Polarizabilities and Hyperpolarizabilities of Carbon Dioxide. 35. D. M. Bishop and G. Maroulis,J. Chem. Phys., 82,2380 (1985). Accurate Prediction of Static Polarizabilities and Hyperpolarizabilities. A Study on FH (X'H'). 36. M. J. S. Dewar and J. J. P. Stewart, Chem. Phys. Lett., 111,416 (1984). A New Procedure for Calculating Molecular Polarizabilities: Applications Using MNDO. 37. M. Jaszunski, Chem. Phys. Lett., 140,130 (1987).A Mixed Numerical-Analytical Approach to the Calculation of Non-Linear Electric Properties. 38. C. Flytzanis, in Quantum Electronics: A Treatise, H. Rabin and C. L. Tang, Eds., Academic Press, New York, 1975, Vol. 1, pp. 9-207. Theory of Nonlinear Optical Susceptibilities. 39. J. 0. Morley, P. Pavlider, and D. Pugh, Int.]. Quantum Chem., 43,7 (1992).On the Calculation of the Hyperpolarizabilities of Organic Molecules by the Sum Over Virtual Excited States Method. 40. P. K. Franken and J. F. Ward, Rev. Mod. Phys., 35,23 (1963).Optical Harmonics and Nonlinear Phenomena. 41. J. F. Ward, Rev. Mod. Phys., 3 7 , l (1965).Calculation on Nonlinear Optical Susceptibilities Using Diagrammatic Perturbation Theory. 42. B. J. Orr and J. F. Ward, Mol. Phys., 20, 513 (1971). Perturbation Theory of the Nonlinear Optical Polarization of an Isolated System. 43. D. R. Kanis, T. J. Marks, and M. A. Ratner, Int. J. Quantum Chem., 43,61 (1992).Calculation of Quadratic Hyperpolarizabilities for Organic T Electron Chromophores: Molecular Geometry Sensitivity of Second-Order Nonlinear Optical Response. 44. F. Meyers, S. R. Marder, B. M. Pierce, and J. L. BrCdas,j. Am. Chem. SOC., 116,10703 (1994). Electric Field Modulated Nonlinear Optical Properties of Donor-Acceptor Polyenes: SumOver-States Investigations of the Relationship Between Molecular Polarizabilities (a,p, and y) and Bond Length Alternation. 45. T. Inoue and S. Iwata, Chem. Phys. Lett., 167,566 (1990).Method of Frequency-Dependent Hyperpolarizability Calculation from Large-Scale CI Matrices. 46. C. W. Dirk, L.-T. Cheng, and M. G. Kuzyk, Int. J. Quantum Chem., 43,27 (1992).A Simplified Three-Level Model Describing the Molecular Third-Order Nonlinear Optical Susceptibility. 47. J. Frenkel, Wave Mechanics, Advnnced General Theory, Oxford University Press, London, 1934. 48. R. McWeeny, Methods of Molecular Quantum Mechanics, 2nd ed., Academic Press, San Diego, CA, 1989. 49. H. Sekino and R. J. Bartlett, J. G e m . Phys., 85, 976 ( 1 986). Frequency Dependent Nonlinear Optical Properties of Molecules. 50. S. P. Karna and M. Dupuis,J. Comput. Chem., 12,487 (1991).Frequency Dependent Nonlinear Optical Properties of Molecules: Formulation and Implementation in the HONDO Program. 51. H. Sekino and R. J. Bartlett, Int. I. Quantum Chem., 43, 119 (1992). New Algorithm for High-Order Time-Dependent Hartree-Fock Theory for Nonlinear Optical Properties. 52. S. P. Karna, Chem. Phys. Lett., 214, 186 (1993).A "Direct" Time-Dependent Coupled Perturbed Hartree-Fock-Rootham Approach to Calculate Molecular (Hyper)polarizabilities. 53. C. E. Dykstra and P. G. Jasien, Chem. Phys. Lett., 109,388 (1984).Derivative Hartree-Fock Theory to All Orders.
References 277 54. H. Sekino and R. J. Bartlett, in Nonlinear Optical Materials: Theory and Modeling, S. P. Karna and A. T. Yeates, Eds., American Chemical Society, Washington, DC, 1996, pp. 78-1 01. Sum-Over-State Representation of Non-linear Response Properties in Time-Dependent Hartree-Fock Theory: The Role of State Truncation. 55. S. P. Karna, 1. Chem. Phys., 104, 6590 (1996). Spin-Unrestricted Time-Dependent Hartree-Fock Theory of Frequency-Dependent Linear and Nonlinear Optical Properties. 56. J. Linderberg and N. Y. Ohm, Propagators in Quantum Chemistry, Academic Press, New York, 1973. 57. J. Oddershede, Adv. Chem. Phys., 69,201 (1 987). Propagator Methods. 58. J. Olsen and P. Jsrgensen,J. Chem. Phys., 82,3235 (1985). Linear and Nonlinear Response Functions for an Exact State and for an MCSCF State. 59. W. A. Parkinson and J. Oddershede,]. Chem. Phys., 94, 7251 (1991).Quadratic Response Theory of Frequency-Dependent First Hyperpolarizability. Calculations in the Dipole Length and Mixed-Velocity Formalisms. 60. P. Norman, D. Jonsson, 0. Vahtras, and H. Agren, Chem. Phys., 203, 23 (1996). Non-linear Electric and Magnetic Properties Obtained from Cubic Response Functions in the Random Phase Approximation. 61. P. Norman, D. Jonsson, 0.Vahtras, and H. Agren, Chem. Phys. Lett., 2 4 2 , 7 (1995).Cubic Response Functions in the Random Phase Approximation. 62. P. Norman, Y. Luo, D. Jonsson, and H. Agren, J. Chem. Phys., 106, 1827 (1997). The Hyperpolarizability of trans-Butadiene: A Critical Test Case for Quantum Chemical Models. 63. D. Jonsson, P. Norman, and H. Agren,]. Chem. Phys., 105, 6401 (1996). Cubic Response Functions in the Multiconfiguration Self-Consistent Field Approximation. 64. J. E. Rice and N. C. Handy,]. Cbem. Phys., 94,4959 (1 991 ). The Calculation of FrequencyDependent Polarizab es as Pseudo-Energy Derivatives. 65. J. E. Rice and N. C. Handy, Int. J. Quantum Chem., 43, 91 (1992).The Calculation of Frequency-Dependent Hyperpolarizabilities Including Electron Correlation Effects. 66. K. Sasagane, F. Aiga, and R. Itoh,]. Chem. Phys., 99,3738 (1993).Higher-Order Response Theory Based on the Quasienergy Derivatives: The Derivation of the Frequency-Dependent Polariza bilities and Hyperpolarizabilities. 67. F. Aiga, K. Sasagane, and R. Itoh, J. Chem. Phys., 99,3779 (1993). Frequency-Dependent Hyperpolarizabilities in the Moller-Plesset Perturbation Theory. 68. F. Aiga and R. Itoh, Chem. Phys. Lett., 251, 372 (1996). Calculation of Frequency-Dependent Polarizabilities and Hyperpolarizabilities by the Second-Order Msller-Plesset Perturbation Theory. 69. J. F. Stanton and R. J. Bartlett,]. Chem. Phys., 99, 5178 (1993).A Coupled-Cluster Based Effective Hamiltonian Method for Dynamic Electric Polarizabilities. 70. H. Sekino and R. J. Bartlett, Chem. Phys. Lett., 234, 87 (1995).Frequency-Dependent Hyperpolarizabilities in the Coupled-Cluster Method: The Kerr Effect for Molecules. 71. D. Feller and E. R. Davidson, in Reviews in Computational Chemistry, K. B. Lipkowitz and D. B. Boyd, Eds., VCH Publishers, New York, 1990. Vol. 1, pp. 1 4 3 . Basis Sets for Ah Initio Molecular Orbital Calculations and Intermolecular Interactions. 72. G. J. B. Hurst, M. Dupuis, and E. Clementi, ]. Chem. Phys., 89, 385 (1988).Ab Initio Analytic Polarizability, First and Second Hyperpolarizabilities of Large Conjugated Organic Molecules: Applications to Polyenes C,H, to CZ2H2,. 73. A. T. Yeates and D. S. Dudis, Abstracts, 208th National Meeting of the American Chemical Society, Washington, DC, August 1994. Moderate-Sized Diffuse Gaussian Basis Sets for the Ab Initio Evaluation of Excited-State Energies and Nonlinear Optical Coefficients. 74. M. A. Spackrnan,J. Phys. Chem., 93,7594 (1989).Accurate Prediction of Static Dipole Polarizabilities with Moderately Sized Basis Sets.
278 Predicting Nonlinear Optical Properties 75. A. J. Sadlej, Collect. Czech. Chem. Commun., 53, 1995 (1988). Medium-Size Polarized Basis Sets for High-Level Correlated Calculations of Molecular Electric Properties. 76. A. J. Sadlej, Theor. Chim. Acta, 79, 123 (1991).Medium-Size Polarized Basis Sets for HighLevel Correlated Calculations of Molecular Properties. 11. Second-Row Atoms: Si Through CI. 77. H. Sekino and R. J. Bartlett,]. Chem. Phys., 98, 3022 (1993). Molecular Hyperpolarizabilities. 78. M. W. Schmidt, K. K. Baldridge, J. A. Boatz, S. T.Elbert, M. S. Gordon, J. H. Jensen, S. Koseki, N. Matsunaga, K. A. Nguyen, S. Su, T. L. Windus, M. Dupuis, and J. A. Montgomery, J. Compui. Chem., 14, 1347 (1993). General Atomic and Molecular Electronic Structure System. 79. B. Kirtman, Int.]. Quantum Chem., 43, 147 (1992). Nonlinear Optical Properties of Conjugated Polyenes from Ah Initio Finite Oligomer Calculations. 80. J. 0. Morley and D. Pugh, in Organic Materials /or Non-Linear Optics, R. B. Hann and D. Bloor, Eds., Royal Society of Chemistry, London, 1988, pp. 28-39. Semiempirical Calculations of Molecular Hyperpolarizabilities. 81. R. J. Bartlett and H. Sekino, in Nonlinear Optical Materials: Theory and Modeling, S. P. Karna and A. T. Yeates, Eds., American Chemical Society, Washington, DC, 1996, pp. 23-57. Can Quantum Chemistry Provide Reliable Molecular Hyperpolarizabilities? 82. P. K. Korambath and H. A. Kurtz, in Nonlinear Optical Materials: Theory and Modeling, S. P. Karna and A. T. Yeates, Eds., American Chemical Society, Washington, DC, 1996, pp. 133-144. Frequency-Dependent Polarizabilities and Hyperpolarizabilities of Polyenes. 83. C. Adant, M. Dupuis, and J . L. BrCdas, Int.]. Quantum Chem., Quantum Chem. Symp., 29, 497 (1995). Ah Initio Study of the Nonlinear Optical Properties of Urea: Electron Correlation and Dispersion Effects. 84. E. Perrin, P. N. Prasad, P. Mougenout, and M. Dupuis,]. Chem. Phys., 91,4728 (1989).Ab Initio Calculations of Polarizability and Second Hyperpolarizability in Benzene Including Electron Correlation Treated by M~ller-PlessetTheory. B. Kirtman, in Nonlinear Optical Materials: Theory and Modeling, S. P. Karna and A. T. 85. Yeates, Eds., American Chemical Society, Washington, DC, 1996, pp. 58-78. Calculation of Nonlinear Optical Properties of Conjugated Polyenes. 86. F. Sim, D. R. Salahub, and S. Chin, Znt.]. Quantum Chem., 43,463 (1992). The Accurate Calculation of Dipole Moments and Dipole Polarizabilities Using Gaussian-Based Density Functional Methods. 87 R. M. Dickson and A. D. Becke, J. Phys. Chem., 100, 16105 (1996). Local Density-Functional Polarizabilities and Hyperpolarizabilities at the Basis-Set Limit. 88. B. J. Dunlap and S. P. Karna, in Nonlinear Optical Materiafs: Theory and Modeling, S. P. Karna and A. T. Yeates, Eds., American Chemical Society, Washington, DC, 1996, pp. 164-173. A Combined Hartree-Fock and Local-Density-Functional Method to Calculate Linear and Nonlinear Optical Properties of Molecules. 89. A. M. Lee and S. M. Colwel1,J. Chem. Phys., 101,9704 (1994).The Determination of Hyperpolarizabilities Using Density Functional Theory with Nonlocal Functionals. 90. D. M. Bishop, B. Kirtman, H. A. Kurtz, and J. E. Rice,]. Chem. Phys., 98,8024 (1993). Calculation of Vibrational Dynamic Hyperpolarizabilities for H,O, CO,, and NH,. 91. D. M. Bishop and B. Kirtman,J. Chem. Phys., 95,2646 (1991).A Perturbation Method for Calculating Vibrational Dynamic Dipole Polarizab es and Hyperpolarizabilities. 92. D. M. Bishop and B. Kirtman, J . Chem. Phys., 97,5255 (1992).Compact Formulas for Vibrational Dynamic Dipole Polarizahilities. 93. B. Kirtman and B. Champagne, Int. Rev. Phys. Chem., 16, 389 (1997). Nonlinear Optical Properties of Quasilinear Conjugated Oligomers, Polymers and Organic Molecules. 94. A. Willetts and J. E. Rice,]. Chem. Phys., 99,426 (1993).A Study of Solvent Effects on Hyperpolarizabilities: The Reaction Field Model Applied to Acetonitrile.
References 279 95. J. Yu and M. C. Zerner, /. Chem. Phys., 100, 7487 (1994). Solvent Effect on the First Hyperpolarizabilities of Conjugated Organic Molecules. 96. K. V. Mikkelsen, Y. Luo, H. Agren, and P.Jergensen,/. Chem. Phys., 102,9362 (1995).Sign Change of Hyperpolarizabilities of Solvated Water. 97. 1. D. L. Albert, S. di Bella, D. R. Kanis, T. J. Marks, and M. A. Ratner, in Polymers for Second-Order Nonlinear Optics, American Chemical Society, Washington, DC, 1995, pp. 57-65. Solvent Effects on the Molecular Quadratic Hyperpolarizabilities. 98. R. Cammi, M. Cossi, and J. Tomasi,J. Chem. Phys., 104, 4611 (1996).Analytical Derivatives for Molecular Solutes. 111. Hartree-Fock Static Polarizability and Hyperpolarizabilities in the Polarizable Continuum Model. 99. R. Cammi, M. Cossi, B. Mennucci, and J. Tomasi,/. Chem. Phys., 105,10556 (1996).Analytical Hartree-Fock Calculation of the Dynamic Polarizabilities a,p, and y of Molecules in Solution. 100. R. Cammi and J. Tomasi, Int. J. Quantum Chem., 29, 465 (1995). Nonequilibrium Solvation Theory for the Polarizable Continuum Model: A New Formulation at the SCF Level with Application to the Case of the Frequency-Dependent Linear Electric Response Function. 101. D. S. Dudis, A. T. Yeates, and H. A. Kurtz, Muter. Res. Soc. Symp. Proc., 247,93 (1992).Intermolecular Effects on Third-Order Nonlinear Optical Properties. 102. S. Chen and H. A. KurtzJ. Mol. Struct. (THEOCHEM),388, 79 (1996).NLO Properties of Interacting Polyene Chains.
CHAPTER 6
Sensitivity Analysis in Biomolecular Simulation Chung F. Wong," Tom Thacher,+ and Herschel Rabitz* "Department of Physiology and Biophysics, Mount Sinai School of Medicine, New York, New York 10029-6574, tvirtual Chemistry, Inc., 7770 Reagents Road #252, San Diego, California 921 22, $Department of Chemistry, Princeton University, Princeton, New Jersey 08544
INTRODUCTION Molecular simulation is playing an increasingly important role in studying the properties of complicated systems such as proteins, DNAs, lipids, and complexes of biomolecules.l-s A key advantage of molecular simulation is that it can help one understand in microscopic detail how the components of a complex biomolecular system, and their interactions, determine the functional properties of the system. Yet, sorting out the critical factors affecting the properties of a complex biomolecular system is typically not an obvious task because the intramolecular and intermolecular interactions giving rise to those properties are quite complex. Even with the relatively simple force fields currently being used in biomolecular simulations, one needs to select, from an enormous number of possible factors (easily over thousands), those that are truly significant in determining a key property of a biomolecular system. In principle, one can do chemical modification or genetic experiments to examine the role of a specific functional group, amino acid residue, or interaction in determining a Reviews in Computational Chemistry, Volume 12 Kenny B. Lipkowitz and Donald B. Boyd, Editors Wiley-VCH, John Wiley and Sons, Inc., New York, 0 1998
281
282 SensitivityAnalysis in Biomolecular Simulation specific property. This approach is still quite expensive to carry out systematically, either experimentally or computationally. Consider a moderate-sized protein with about 200 residues: since more than 200 mutations or chemical modifications are needed to systematically map the residues that are significant in determining a specific property of the protein, this would be a very time-consuming and expensive assessment to carry out. New strategies need to be developed to identify efficiently the determinants of biomolecular structures and functions, and to guide the design of novel bioactive molecules. Maiiy problems in engineering research are analogous, in principle, to the problem of dissecting the determinants of biomolecular properties. For example, chemical engineers may be interested in optimizing the performance of a chemical reactor. To achieve this goal, they need to identify the key parameters of the reactor that control its performance and then examine how these parameters can be optimized to improve the reactor's output. Models can be set up to simulate the reactions that take place in a chemical reactor. The parameters of the model can then be varied one by one, to examine how each affects the performance of the chemical reactor. The most significant parameters controlling the performance of the reactor can be identified and then optimized to improve this performance. It is difficult to carry out a sensitivity analysis in this manner when multiple parameters influence the performance of the reactor, however, because many simulations with different sets of model parameters need to be carried out. To overcome this problem, engineers have employed a simple trick. They calculate derivatives of observables/properties of a simulation model with respect to the parameters of the model instead of making many explicit variations of model parameters. This clever approach makes it much easier to identify the determinants of systems properties in a systematic manner. This chapter focuses on a recent development: the extension of this idea of sensitivity analysis to the domain of biomolecular modeling. Readers who are interested in studying sensitivity analysis in engineering research, chemical kinetics, or small-molecule dynamics can refer to books and reviews that have been published.6-11 This chapter presents four applications of sensitivity analysis in biomolecular modeling: the identification of the determinants of biomolecular properties, the design of bioactive molecules, the study of error propagation in simulating biomolecular properties arising from nonoptimal biomolecular force field parameters, and the refinement of potential models for biomolecular simulations.
METHODS As mentioned earlier, an efficient means of probing the sensitivity of the properties of a model biomolecular system is to calculate the derivatives of the properties of the system with respect to the model parameters. (Model parameters are not limited to force field parameters: consider, e.g., nonbonded cutoffs for electrostatic interactions.) Because these derivatives probe the responses of
Methods 283
the properties of a biomolecular system when different model parameters are perturbed, they are also called sensitivity coefficients. Model parameters that play no role in determining a given set of properties give sensitivity coefficients with negligible or zero magnitude. On the other hand, model parameters that give large sensitivity coefficients can reveal the key determinants of the properties of the biomolecular system. Sensitivity coefficients are classified as first order, second order, and so on, depending on the order of the derivatives. In most applications, only first- and second-order sensitivity coefficients are calculated and analyzed. Each first-order sensitivity coefficient provides a one-toone relationship between a property and a model parameter. A second-order sensitivity coefficient illuminates the cooperative or anticooperative effects of two model parameters. It is impractical to calculate and study higher order sensitivity coefficients because long simulations are usually required to obtain reliable values for them. Alternatively, one can carry out a singular value decomposition12 (which is equivalent to a principal component a n a l y ~ i s ’ ~of) a sensitivity matrix whose elements are sensitivity coefficients to study how a group of model parameters act together to affect a group of biomolecular properties. Examples of this are presented later. The computational efficiency of a sensitivity analysis stems from the analyst’s ability to obtain all the sensitivity coefficients by carrying out simulations on only one reference system. This is in contrast to making explicit changes to model parameters and then repeating the simulations hundreds or thousands of times. Parameters of different types usually exist in a simulation model. Some of the most interesting parameters to include in a sensitivity analysis are the ones that determine the intra- and intermolecular interactions, because the properties of a biomolecular system at any given thermodynamic state are determined solely by these interactions. An interesting multifaceted question to ask is: Which features of the intramolecular and intermolecular interactions are most significant in determining a particular property of a biomolecular system, and, to which components of the system can these features be mostly attributed? Such investigations can help to decipher the determinants of biomolecular structures and functions, and they can suggest how (bio)molecules may be modified to alter their structural or functional properties in a well-planned desired manner. (In this chapter, we use “biomolecules” in statements that pertain specifically to biomolecules-proteins, DNAs, etc.-and we use “( bio)molecules” in statements that pertain to either biomolecules or small molecules such as therapeutic drugs.) The interactions among the atoms of a biomolecular system are usually modeled by approximate potential energy functions such as the one shown in Eq. [1]1-3,14,15:
284 SensitivityAnalysis in Biomolecular Simulation
where the first sum is over all the bonds, the second sum is over all the bond angles, the third sum is over all the dihedral angles, and the last sum is over all the nonbonded pair interactions. In this equation, the harmonic approximation is used to describe the bond-stretching and angle-bending energies, and sinusoidal functions are used to describe energy changes due to rotations about bonds. The last term is the nonbonded interaction term, which is a sum over all pairs of atoms that are separated by a specified number of bonds (three, e.g., in the GROMOS16 force field). In this example, the nonbonded interactions are modeled by a sum of Lennard-Jones potentials and Coulombic-type electrostatic potentials. To identify the model parameters that are most significant in affecting biomolecular properties, one needs to calculate many sensitivity coefficients relating these properties to the parameters. For any property characterized by the variable 0, whose ensemble-averaged value is 0, the derivative of 0 with respect to a parameter A, can be shown to
a 0 a
OaH
121
In Eq. [2], H i s the classical Hamiltonian of the system of interest and p = l/k,T, in which k , and T are the Boltzmann constant and the absolute temperature, respectively. If 0 does not depend explicitly on the parameter A,, Eq. 121 simplifies to:
so that the parametric derivative of 0 is simply proportional to the covariance of 0 and the partial derivative of H with respect to At. (Equations 121 and [ 3 ] and other similar equations shown below allow partial derivatives to be calculated analytically instead of numerically.) To facilitate a comparison of different sensitivity coefficients, it is sometimes convenient to calculate the fractional change of a property due to a fractional change of a model parameter. It is then useful to calculate dimensionless normalized sensitivity coefficients of the form
It is useful to point out that two sensitivity coefficients associated with parameters of the same type may give different sensitivities. For example, it is common in current biomolecular force fields to use the same force field parameter for chemically similar groups (e.g., the atomic partial charges for all the amide nitrogens have the same value in commonly used force fields such as those in the GROMOS,16 CHARMM,20and AMBER” molecular modeling packages).
Methods 285 If, however, the environments of the atoms to which these parameters are associated differ, these identical parameters may generate different sensitivities for a given property of a protein. In fact, these identical force field parameters can be viewed as “probes” for studying different local environments of a protein, just as a spectroscopic probe is used to learn about the structural and dynamical properties of different parts of a protein. Therefore, sensitivity analysis provides an efficient tool for thoroughly probing the properties of different parts of a protein. A second-order sensitivity coefficient describing the cooperative/anticooperative effects of two model parameters-Xr and Xl-in affecting an ensemble-averaged property
When the operator 0 does not depend explicitly on the parameters X i and
Xi’ Eq. [5] simplifies to:
286 Sensitivity Analysis in Biomolecular Simulation The formulas involving thermodynamic properties, however, are somewhat different because the formula for calculating a thermodynamic quantity differs from that for calculating an ensemble-averaged quantity. For example, the firstorder sensitivity coefficient relating the Helmholtz free energy A of a biomolecular system to a potential parameter hiis expressed in the form
g=(gj
[71
The second-order sensitivity coefficient relating the free energy A of a biomolecular system and two potential parameters hi and hi is expressed in the form
& (-) =
- p[(
(El[g))(($))((e))j -
IS'
A special type of sensitivity coefficient probes the structural responses of a biomolecule to perturbations introduced to different parts of the biomolecule. In molecular mechanic^,^^>^^ a Green's function matrix containing this information can be derived as follows. The x, y, or z component of a force FI acting on an atom of a molecule is given by F. =-- av
ax,
[91
where x i is the x, y, or z component of the Cartesian coordinate of an atom. Taking the derivative of the force with respect to a potential parameter A , one obtains the following equation for the parametric derivative of the force:
The condition
applies when a molecule is at a local or global minimum of its potential energy surface. Equation [ 101 can also be written in matrix form as follows:
-HS+M=O
Methods 287 where H is the Hessian matrix containing the elements a2vlax,axj, S is a sensitivity matrix containing the parametric derivatives &/ax, as matrix elements, and matrix M is composed of the elements d2VldxidX,. A Green’s function matrix satisfying the relation
HG=I
~ 3 1
where I is a unit matrix, can be constructed so that the sensitivity matrix can be obtained as follows:
S=GM
~ 4 1
Because matrix S contains parametric derivatives of structural variables and matrix M contains parametric derivatives of forces applied to a molecule, the elements of the Green’s function matrix G describe structural responses of the molecule when small forces are applied to the atoms of the molecule. (If Fi = -av/ax, is an internal force arising from an intramolecular interaction potential V, -F, = 8Vfaxi can be considered to be a force applied to an atom of a molecule.) One can follow a similar approach to derive a more general Green’s function that measures the change of an ensemble-averaged structural coordinate, rather than a structural coordinate of a molecule at a local or global energy minimum, when a force is applied to an atom of the molecule. Derivation through such a Green’s function approach is more complicated, but there is an easier way to derive this general Green’s function. (We shall call this a generalized Green’s function even though we shall no longer derive this function through a Green’s function approach because the physical insights that this more general function provides are similar to those of the molecular mechanics Green’s function.) In the easier derivation that follows,24 suppose a small perturbation potential Vp of the form:
is applied to a system in which the atoms are interacting through a potential l? In Eq. [15] xiis a coordinate associated with atom i, ki is a constant, <...>o denotes an average over an ensemble of an unperturbed system (i.e., a system with no perturbation potential Vp applied). The perturbation potential V,, imposes a force of -kj on atom i. An ensemble-averaged coordinate of atom j, <xi>, of the system with an applied perturbation potential V, can be written as
288 SensitivityAnalysis in Biomolecular Simulation where J . . . dT denotes an integration over all the atomic coordinates. For a small value of k;,Eq. [16] can be approximated by
Equation 1171 also can be written in the form
where Axl = xl - <x,>,. as
If one expands <xl> on the left-hand side of Eq. [ l S ]
and compares the coefficients of the linear terms in kion both sides of Eq. [ 191, noting that
the elements of G, of a generalized Green’s function matrix G can be obtained from the familiar displacement correlation function
as
6
where = - k, is the x, y, or z component of an external force applied to atom j. Each element GzIof the Green’s function matrix measures the extent to which the averaged coordinate <xJ> is changed as a result of a small force applied to atom j. In its most general form, the Green’s function matrix is of order 3 N x 3 N , where N is the number of atoms in the system. However, rigid body translation is usually not of interest, and its contribution to the Green’s function matrix
fi
Methods 289 can be easily removed. The overall tumbling motion of a molecule, on the other hand, is typically coupled to the intramolecular vibrations of the molecule. In many cases, through, it may be a good approximation to neglect the coupling between the overall tumbling motion and the intramolecular vibrations of a biomolecule, and this contribution to the Green’s function matrix can also be removed, albeit approximately. (Better methods than those currently used are needed for removing the contribution of molecular rotation to Green’s function matrix elements. A recent work on calculating positional covariance matrices from molecular dynamics trajectories showed the results to be dependent on how the rotational components were approximately removed from the positional covariance matrices.25) The resulting Green’s function matrix thus mainly accounts for structural deformations when small forces are applied to different parts of the molecule. The essence of Eq. [22] is that it quantitatively predicts the extent to which the average position of an atom is perturbed when a small force is applied to another atom, in the linear response limit. The extent of the perturbation of the position of an atom by a force applied to another atom depends both on the degree of correlation of the two atoms and on the positional fluctuations of the two atoms about their means. The approach taken here to deriving the generalized Green’s function matrix has several advantages over the molecular mechanics approach described earlier. (1)The molecular mechanics approach only considers a molecule at a local minimum, whereas the generalized Green’s function approach accounts for an ensemble of structures accessible at physiological temperatures. (2) The calculation of a molecular mechanics Green’s function matrix requires the inversion of a Hessian matrix H of order 3 N X 3 N , where N is the number of atoms in a molecule; this is difficult to carry out for large biomolecules. The generalized Green’s function approach, on the other hand, does not require the inversion of a large matrix, and it can therefore be applied more easily to large biomolecules. ( 3 )Solvent effects are easier to include in the generalized Green’s function approach. In the molecular mechanics method, one in principle needs to include solvent coordinates in the Hessian matrix and invert this whole matrix to obtain the corresponding Green’s function matrix elements. Accordingly, if the molecular mechanics Green’s function matrix elements of the solute atoms embedded in an explicit solvent environment are sought, one needs to invert a Hessian matrix that includes the solvent coordinates to account for the effects of the solvent on the Green’s function matrix elements. It is probably a good approximation to neglect the matrix elements involving solvent coordinates in the Hessian matrix when only the solute matrix elements of the molecular mechanics Green’s function are needed. Nevertheless, the generalized Green’s function approach does not introduce this additional approximation if the trajectory used to calculate a positional covariance matrix is obtained from a simulation that already includes solvation effects (i.e., by the use of an explicit solvent model in a molecular dynamics simulation).
290 Sensitivity Analysis in Biomolecular Simulation
A Green’s function matrix can be manipulated in different ways to gain different insights into the structure and function of a biomolecule. For example, one can calculate
je(atom k ) i=l
for a protein molecule. (The first sum is over the three Cartesian components of atom k.) This quantity provides a general measure of the structural response of the protein when atom k is perturbed. Atoms associated with large values of G;a,,m k) may play important roles in determining the structure of the protein. One can also study collective structural responses of a protein molecule by diagonalizing a Green’s function matrix. Such an analysis is somewhat similar to a quasi-normal mode analysis.26-28 In a quasi-normal mode analysis, one focuses on obtaining the effective normal modes of a molecule by constructing an effective Hessian matrix from a molecular dynamics trajectory so that anharmonic effects can be accounted for approximately. The eigenvalues of the effective normal modes can be used to provide estimates of thermodynamic quantities, and the eigenvectors can give useful insights into the local and collective motions of the molecule. In a Green’s function analysis, on the other hand, one can directly identify groups of atoms that cooperate to produce large (or small) local or global structural responses of a molecule by diagonalizing a Green’s function matrix (see section on Applications). For large biomolecules, the diagonalization of a Green’s function matrix of order 3N X 3N is difficult. However, the principal component analysis technique13 provides a useful alternative. As discussed earlier,29 a principal component analysis can be achieved through a singular value decomposition12 of an m X n matrix, where m and n are integers and need not be equal. When a singular value decomposition12 is applied to a Green’s function submatrix of dimension m X n, one can study how small forces applied to the part of the protein defined by n propagate to affect structure in the part of the molecule defined by m. A comparison of this approach with some similar methods is discussed later. The singular value decomposition (SVD) method,12 and the similar principal component analysis method,I3 are powerful computational tools for parametric sensitivity analysis of the collective effects of a group of model parameters on a group of simulated properties. The SVD method is based on an elegant theorem of linear algebra.12 The theorem states that one can represent an m X n matrix M by a product of three matrices:
M = UDVT
~ 4 1
where U is an m x n matrix that satisfies the relation UTU = I in which I is the unit matrix; D is a diagonal matrix, and the diagonal elements are known as
Methods 292 singular values; and V is an orthogonal matrix that diagonalizes MTM. To use this theorem to study cooperative effects of potential parameters on simulated biomolecular properties, one can take M to be a first-order sensitivity matrix S containing the elements aO,tak,. For cases involving small perturbations and responses, one can use the following equation to estimate the responses of biomolecular properties to small perturbations of model parameters.
in which the vector ddcontains the elements d o j of parametric responses of computed properties, and the vector d i contains the elements dhi of parameter perturbations. By performing a singular value decomposition on S, one can write d 6 as follows:
Multiplying on the left by the transpose UT U T d d = DVTdi Then
do’ = Ddi;’ where
and
dg
= VTdi
Therefore, the elements of d d ’ are linear combinations of the elements of d6, and the elements of dg are linear combinations of the elements of d i . Because D is a diagonal matrix, Eq. [26c] represents a set of n decoupled equations. Each equation relates an element of d d ’ to an element of d?, that is,
do; = D,,dK;
~ 7 1
where d 6 ; and dii are the ith elements of the vectors dd’and di;‘, respectively, and Dii is the corresponding diagonal element of the matrix D. Therefore, each element of di;’ is mapped into an element of dd’ by the scaling factor Dji.Because each d d and each di;’ are linear combinations of the elements of d d and
292 Sensitivity Analysis in Biomolecular Simulation
d i , respectively, each of the n decoupled equations describes how a linear combination of parameters affects a linear combination of observabledproperties. The equations associated with the largest DIiidentify the linear combination of parameters having the most profound influence on a linear combination of biomolecular properties.
DEPENDENCE OF SENSITIVITY RESULTS ON THE CHOICE OF FORCE FIELDS Depending on the force field used in a molecular dynamics or a Monte Carlo simulation, sensitivity coefficients with apparently similar forms may have very different values. For example, the electrostatic energy of a biomolecule in solution may be modeled by using Coulomb's law with effective atomic partial charges that implicitly approximate the effects of electronic polarization of the atoms of the biomolecule by their surroundings. On the other hand, explicit polarization terms can also be used in a force field to calculate the electrostatic energy of a system. Consider the familiar example of water models. Effective charge modelssuch as the SPC30 and the TIP3P3' models-are commonly used to study the properties of water and aqueous solutions of biomolecules. The atomic partial charges of these water models give a water monomer a dipole moment larger than the corresponding experimental gas phase value. These larger dipole moments of the water models reflect the effects of electronic polarization of water molecules by their environment in the liquid phase. Without using larger magnitudes for the atomic partial charges to yield a larger effective dipole moment of a water molecule, a nonpolarizable water model such as the SPC and TIP3P models would have underestimated the electrostatic energy of liquid water. More sophisticated water models account for atomic or molecular polarizability explicitly, so that water molecules can be polarized differently according to their environment.3240 In these polarizable water models, the atomic partial charges are usually chosen to reproduce the gas phase dipole moment of a water monomer. These atomic partial charges are different from the corresponding charges used in the effective charge water models. If one calculates the sensitivity coefficients of different properties of an aqueous system with respect to the atomic partial charges of the water models, the sensitivity coefficients obtained from an effective charge model can be different from those of a polarizable model. This difference was found in calculations of the charge sensitivities of different properties of liquid water with two effective charge models (the SPC30 and the TIP3P models3') and a polarizable water mode117*40.41(Table l),although the differences were more pronounced for sensitivity coefficients of some types than for others. Therefore, in using sensitivity coefficients to help identify the determinants of bio(mo1ecular) proper-
-27.4 -25.6 -22.6 -4.7 -5.4 -0.2
? 0.2 +- 0.2 +- 0.2 f 0.6 t 0.3 +- 0.2
Internal Energy (kcaPrno1)
-32,000 i- 1000 -29,400 i- 700 - 17,500 i- 500 -58,800 +- 500 -55,500 C 400 -44,700 f 300
Pressure (atm)
-1.9 +- 0.6 -2.7 +- 0.5 -0.7 ? 0.6 -1.3 5 0.6 -1.6 2 0.5 0.3 f 0.6
Thermal Pressure Coefficient ( a t d d e g )
0.2 -0.0 0.2 0.2 0.2 0.6
+- 0.7 -+ 0.4 f 0.6 +- 0.7 ? 0.4 ? 0.5
Kirkwood G, factor
aThe SPC model is from Refs. 17 and 30, the TIP3P model is from Refs. 17 and 31, and the polarizable water model is from Refs. 38-40. 0, qo,, and qHyddenote, respectively, a property of liquid water, the atomic partial charge of the oxygen in a water model, and the atomic partial charge of the hydrogen in the water model.
q o x a o / a q O x , spc qoxao/aqox, T I P ~ P q o x a 0 / a q o x , Polarizable qHydao/aqHy& spc qHydao/aqHy& q H v d a O / d q H v d , Polarizable
0
Polarizable Water Modelsa
Table 1. Sensitivity of Several Properties of Liquid Water to the Perturbation of the Atomic Partial Charges of the SPC, TIP3P, and
294 sensitivity Analysis in Biomolecular Simulation ties, one needs to keep in mind the types of potential functions used in a simulation. For the example of liquid water, a simulated water property may respond to the perturbation of an atomic partial charge of an effective charge water model and the perturbation of an apparently similar atomic partial charge of a polarizable water model in different ways. In a polarizable water model, one can perturb an atomic partial charge with the electronic polarizability of the atom held fixed. O n the other hand, since an atomic partial charge of an effective charge model has implicitly included the effects of electronic polarizability in an approximate manner, perturbing an atomic partial charge of an effective charge model also has the effect of modifying the electronic polarizability of the water model. Therefore, just as in genetic studies, in which different point mutations may reflect different determining factors of the properties of a protein, the perturbation of apparently similar parameters of potential models of different types may reflect different determining factors of the properties of a biomolecular system. A user of sensitivity analysis needs to keep this in mind to make proper inferences from sensitivity data.
CONVERGENCE ISSUES As in simulating any other properties of bio(molecu1ar) systems, it is important to address the convergence characteristics of sensitivity coefficients of different types. The speed of convergence of sensitivity coefficients depends on the observables/properties and on the model parameters involved. The error bars for the simulation of several properties of liquid water in Table 1 illustrate the convergence characteristics of several types of sensitivity coefficient involving atomic partial charges. It is clear that the charge sensitivities of the internal energy and the pressure of liquid water have much smaller relative statistical errors than that of the Kirkwood G, factor. The Kirkwood G, factor is often calculated by using the expression:
where Giis the molecular dipole moment of water molecule i, and N is the total number of water molecules in a simulated system. Thus, the Kirkwood factor G, is related to the fluctuation of the collective dipole moment ZYGj of the water molecules in a simulation cell; it is well known that to obtain reliable estimates of this fluctuational property, one needs to carry out long simulations. Sensitivity coefficients relating the free energy of a system to the atomic partial charges are among the sensitivity coefficients that can converge quickly.
ConverKence Issues 295
100
50 0 -50
-1 00 -1 50
-r o
o
o
Z I T
N
0
Figure 1 Charge sensitivities of the free energy of a solution of glycine dipeptide in methanol. The first three atomdextended atoms (CM1, C, and 0)builds up the acetyl N-terminal blocking group. The next five atomdextended atoms (N, H, CA, C, and 0) represent components of a glycine unit. The last three atomdextended atoms at the Cterminal end (N, H, and CM2) terminate the peptide as an N-methyl amide.
Figure 1 illustrates the calculation of the charge sensitivity of the free energy of an organic compound, N-acetylglycine-N'-methylamide(also called glycine dipeptide or terminally blocked glycine by some authors), in methanol. One can see that the charge sensitivities of the free energy of this system obtained from three 10 ps segments of a 30 ps molecular dynamics simulation are quite similar, even though a large portion of the +$ space has been sampled during the 30 ps simulation (Figure 2). Figure 3 illustrates results obtained from a simulation of the protein bovine pancreatic trypsin inhibitor (BPTI) in a pseudosolvent environment. The simulation was carried out without including explicit bulk solvent molecules, and the screening effects of the bulk solvent on the intramolecular electrostatic interactions were modeled by scaling the charges of the basic and acidic residues so that they retained a net charge of zero, a method employed by the GROMOS molecular dynamics package.16 Figure 3 shows the similarity of results for the sensitivity of the free energy of the protein upon perturbation of the atomic partial charges of the amide nitrogens obtained from two 70 ps segments of a molecular dynamics simulation, suggesting that most of the sensitivity coefficients had converged reasonably well. However, there exist a few residues (e.g., near residue Arg42) that require longer simulation times to achieve statistics comparable to those for the other sensitivity coefficients. This observation is not surprising. Different regions of a protein can differ in flexibility, and the sensitivity coefficients associated with the partial charges located in those flexible
296 SensitivitvAnalvsis in Biomolecular Simulation
0
50 -50
L
1 1
0
0
-loo -150
0
0
-150 -100 -50
0
4
50
100
150
Figure 2 The $4space sampled by a 30 ps molecular dynamics simulation of a solution of glycine dipeptide in methanol.
regions are expected to converge more slowly. This behavior was also found in avian pancreatic polypeptide (APP).19 The example presented for BPTI also highlights the discussion above, which indicated that sensitivity coefficients associated with apparently similar
Figure 3 Sensitivity of the free energy of bovine pancreatic trypsin inhibitor to perturbations of the atomic partial charges associated with the amide nitrogens of BPTI.
Convergence Issues 297 parameters may be very different. All the sensitivity coefficients presented in Figure 3 were associated with the same single property (the free energy of the protein) and the same type of parameter (the atomic partial charge of the amide nitrogens, which has one value in the GROMOS force field.16) Nonetheless, it is clear from Figure 3 that the values of these sensitivity coefficients were quite different, reflecting the different environments of the different amide nitrogens in the protein. Parametric structural sensitivities of the protein avian pancreatic polypeptide have also been studied by Zhang et a1.,19 who found that these parametric structural sensitivities converged more slowly than parametric free .. .. energy sensitivities. The choice of nonbonded cutoffs is another important factor to consider when one is estimating the statistical errors of simulated properties. Many simulations have been carried out by using nonbonded cutoffs (i.e., the interaction potential between two nonbonded atoms is calculated only when the distance separating them is smaller than a user-chosen value). The nonbonded cutoffs for calculating the long-range electrostatic interactions are often only 8 8, and rarely exceed 15 A. The neglect of long-range electrostatic interactions beyond these relatively small values of nonbonded cutoffs may overestimate the fluctuational properties of simulated biomolecular systems. The sensitivity RcutdG,IaRcU, of the Kirkwood G , factor of liquid water to a choice of the nonbonded cutoff value Rcut has been s t ~ d i e d . ~ OThis ? ~ l cutoff is slightly different from the nonbonded cutoff discussed above. Instead of neglecting the interaction energy between atom A and all the atoms more distant from A than the nonbonded cutoff value, Rcut separates a space around an atom into an inner region and an outer region. In the inner region, the electrostatic interactions between atom A and any atom within the inner region were calculated explicitly. The outer region is treated as a dielectric continuum. For RcutdG,IdRcut was found to be negative for this reaction field type of the flexible SPC model, the flexible TIP3P model, and a flexible polarizable water model, 7,40,41 suggesting that a better model for properly including longrange electrostatic interactions could reduce the molecular fluctuational properties. (Recall that the Kirkwood factor is proportional to the fluctuation of the collective dipole moment of the water molecules in a simulation cell). Cutoff sensitivity was further illustrated by a recent Brownian dynamics simulation in which the fluctuation of the collective dipole moment of a NaCl solution calculated by using a 40 8, nonbonded cutoff was -25% smaller than that obtained by using a 20 8, Because the statistical error of a simulated quantity is related to the fluctuation of the property (larger statistical error being expected for a larger fluctuation), a simulation using shorter nonbonded cutoffs may give a larger statistical error than one using longer cutoffs for simulations of the same duration. Keep in mind, too, that a simulated quantity may be directly affected by the choice of the magnitude of the nonbonded cutoff. The simulations of the potentials of mean force between ion pairs illustrate this.44,45The choice of nonbonded cutoff values can be an important factor to consider in simulativg biomolecules that are stabilized by relatively weak
298 Sensitivity Analvsis in Biomolecular Simulation forces. The larger structural fluctuation resulting from the use of a short nonbonded cutoff may unfold a protein or a DNA during a molecular dynamics or Monte Carlo simulation. This artifact can reveal itself quickly in simulations of a short helix, for example, whose structure is stabilized by relatively few interaction~.~~
APPLICATIONS Determinants of (Bio)molecular Properties Sensitivity analysis can be applied in the identification of the features of a force field model or the key components of a bio(molecu1ar) system that are most significant in determining the properties of the system. It is not always straightforward to identify these key features. The interactions operating within typical biomolecular systems are characterized by many short-range and longrange interactions; the complex interplay among these interactions of different types often makes it difficult to predict by intuition alone the role of each in influencing the properties of a (bio)molecular system. Several illustrative examples are given below.
Liquid Water Liquid water is an important medium in which most biomolecules perform their function. Pure liquid water may seem like a very simple system, where one may be able to identify the determining factors of its properties easily, but this is not always found to be the case. Table 2 gives examples of studying several thermodynamic properties of liquid water using a flexible SPC water model in a molecular dynamics sim~lation.~' This flexible water model was characterized by the following interaction potential:
The indexes 1and k in the last term represent either 0 or H, i and j label water molecules; E and u are the Lennard-Jones energy well depth and radius, respectively, the superscript O is used to label reference (idealized) distances; kjare force constants; and N the number of water molecules in the simulations. The
'OH
-27.4
* 0.2
58 f 1
-62.5 2 0.8
-27.4
59 ?
0.2
*1 3.3
+ 0.8
-7.9 2 0.6 2.9 2 0.2 -14 2 1
-64.9 f 0.8 64,400 ? 1000 44,000 C 1000 -32,000 -C 1000 -58,800 2 500
Pressure (atm)
6 t 3
-4 c 1 12 t 2 -4 2 7
Thermal Pressure Coefficient (atddeg)
4 2 2 421 -4.5 2 0.8
-5 ? 4 6+1 -24 2 3
Heat Capacity (calmol-* deg-l)
'The results presented are the sensitivity coefficient Ad0 / dA in which A is one of the potential or model parameters: yOH (equilibrium bond-length of the flexible SPC model), yHH (equilibrium distance between the two hydrogens of a water molecule in the flexible SPC model), k, (harmonic force constant of the bond), u (Lennard-Jones repulsive parameter of the water model), E (Lennard-Jones well depth of the water model), qo (atomic partial charge on the water oxygens), qH (atomic partial charge on the water hydrogens), and Rcu, (cutoff radius in the reaction field model), respectively.
Rcur
40 qH
E
U
kl
'HH
Free Energy (kcal/mol)
Entropy (cal.mol-' deg-')
Internal Energy (kcahol)
Table 2. The Most Significant Parameters of the Flexible SPC Water Model That Control Different Thermodynamic Properties of Liquid Watef
300 SensitivityAnalysis in Biomolecular Simulation cutoff radius Rcut in the reaction field geometry was chosen to be half the basic cubic box length of 19.726 A. Because Lennard-Jones energies were calculated only between two water oxygens, each y j j was the distance between the oxygen of molecule i and that of molecule j. The inclusion of the cross terms in the intramolecular vibrational degrees of freedom allowed couplings among the valence bond angle and the bond lengths of a water molecule. cRFis the dielectric constant of the outer region in the reaction field model and was chosen to be 80 in the simulations. The parameters giving the largest sensitivity for each of the selected thermodynamic properties considered are listed in Table 2. It is clear that different properties of liquid water are controlled by different features of the water model. The free energy of liquid water was found to be affected most by the perturbation of the equilibrium bond length and the Lennard-Jones repulsive parameter of the water model, not by perturbations of the atomic partial charges of the water model. This result reflects the occurrence of a hydrogen bond network in liquid water. Increasing the equilibrium bond length or decreasing the Lennard-Jones repulsive parameter of the water model enhances the formation of hydrogen bonds, consequently decreasing the free energy of the model liquid water. However, this energetic argument is valid only when the free energy of liquid water is dominated by the contribution from the internal energy rather than by the entropy. This is the case for the flexible SPC water model because one finds similar parametric sensitivities of the free energy and the internal energy (Table 2 ) J 7 The entropy of the liquid, on the other hand, was found to be controlled by a different set of potential parameters of the water model.” In particular, the flexibility associated with the 0-H stretching mode played the most significant role in determining the entropy of liquid water. Making the 0-H bond more rigid, by increasing the harmonic force constant k,, decreased the entropy. As expected, increasing the equilibrium bond length of the water model also decreased the entropy of the liquid, because the enhancement of the intermolecular hydrogen bonds gave the liquid a more icelike character. The sensitivity analysis study also revealed that long-range interactions can play an important role in determining the entropy of the liquid. Analysis of the pressure of the liquid gave additional inf0rmation.l’ Whereas it is reasonable to expect that increasing the Lennard-Jones repulsive parameter will increase the pressure of the liquid at constant volume, and increasing the magnitude of the atomic partial charges will decrease the pressure of the liquid, it is hard to predict on the basis of intuition alone that increasing the magnitude of the Lennard-Jones well depth may increase the pressure of the liquid. The effects of different potential and model parameters on different structural and energetic distribution functions-including radial distribution functions between water atoms, the distribution function of the interaction energy of a water molecule with its surroundings [P(u) of Figure 41,and the distribu-
Applications 301
t
1
0
-60
-40
-20
u (kcaVmol)
0
20
Figure 4 Distribution function P(u) describing the interaction energy ( u ) of a water molecule with its surroundings. tion function of the local electric field at the oxygen of a water molecule projected along the permanent dipole moment vector of a water molecule-were studied by Zhu and W ~ n g Examples . ~ ~ from that study are shown in Figure 5 for the flexible SPC water r n ~ d e l . ~ ’It, ~is~clear from Figure 5 that these potential/model parameters affected the distribution function P(u) of the interaction energy of a water molecule with its surroundings very differently. Perturbations of the 0-H bond harmonic force constant k, had a larger influence on the peak and the high energy wing of P(u). Increasing the O-H bond flexibility, effected by decreasing its harmonic force constant, shifted the peak of P(u) toward more favorable interaction energy and reduced the fraction of water molecules having high interaction energies. Increasing the magnitude of the partial charge of the oxygen qo in the water model broadened the distribution P(u), resulting in increases in both the fraction of water molecules having favorable interaction energies (negative) and the fraction of water molecules having unfavorable interaction energies (positive). The increase in the low energy wing of P(u) was greater than that in the high energy wing, so a net gain of water molecules having favorable energies resulted. This observation is consistent with the finding (Tables 1 and 2) that increasing the magnitude of the atomic partial charges of the water oxygens decreased the internal energy of the liquid. Perturbing the atomic partial charges of the water hydrogens qH had little effect on the low energy wing of P(u) but had a significant effect on the high en-
'p
0
cu
0
0
0
P
0
0
(?
302
0
*-)
(u
1
7 0
'p
3
m
Applications 303 ergy wing. Increasing the nonbonded cutoff Rcut of the reaction field model had significant effects on the far wings of P(u), decreasing the fraction of water molecules having very favorable interaction energies and increasing the fraction of water molecules having very unfavorable energies. These examples indicate the complexity of the relationships between the properties of liquid water and the model parameters determining the intramolecular and intermolecular interactions of liquid water. Many of the results discussed in this section for liquid water are hard to predict by intuition alone. Water is a comparatively simple system in (bio)molecular modeling, and it is even harder to identify the determinants of the properties of complex biomolecular systems. Sensitivity analysis should thus be a useful tool for sorting out the key contributing factors of interesting biomolecular properties. Two-Dimensional Square Lattice Model of Protein Folding
Sensitivity analysis has been applied to analyze a two-dimensional square lattice model of protein folding.I8 In this example, each residue of a model polypeptide could only occupy the lattice points of a two-dimensional square lattice, and only two types of residue-hydrophobic and hydrophilic-were assumed to exist. Two conformations of a 10-residue model polypeptide in a twodimensional square lattice are shown in Figure 6. The model polypeptide gained stabilization energy only when two nonbonded hydrophobic groups were in contact, so the polypeptide whose conformation had more hydrophobic contacts had a lower energy than a conformation having fewer hydrophobic contacts. The two conformations of Figure 6 have the same energy if all the residues are hydrophobic because they have the same number (4) of nonbonded hydrophobic contacts. To facilitate sensitivity analysis studies, Bleil et a1.18 wrote the interaction energy between two residues i and i in contact as the product of two energy parameters el and Ei in which el = 0 if residue i is a hydrophilic residue and (&, )2 = chhif residue i is a hydrophobic residue. A sensitivity analysis study was carried out by calculating first-order and second-order sensitivity coefficients of the forms ao/as2and a20/a&,a&,, in which 0 represented a thermodynamic or ensemble-averaged property of a model polypeptide in the two-dimensional square lattice representation. Figure 7 offers an example of the cooperative effects of two hydrophobic residues on the entropy of a 10-residue model polypeptide (with a homogeneous sequence of hydrophobic residues) on the two-dimensional square lattice. The second-order derivative of the entropy of the model polypeptide with respect to each pair of sl and El, ( d 2 S / d ~ l a ~ l ) is ~z~,, displayed. The pattern of sensitivity coefficients was found to be complex and dependent on the hydrophobic interaction energy parameter ehh.At very low hydrophobic interaction energies, 1% k,T, the most prominent cooperative effects arose from (i, i + 3 ) residue pairs. The cooperative effects from (i, i + 5 ) , (2, i + 7 ) ,and (i, i + 9) pairs were also significant. Their sensitivity coeffi-
-
304 Sensitivity Analysis in Biomolecular Siniulation
Helix
antiparallel
p strand
Figure 6 Two conformations of a 10-residue model peptide on a two-dimensional square lattice.
cients were mostly negative, suggesting that the entropy of the model polypeptide decreased when the hydrophobicity of both residues in each of these pairs increased. In the two-dimensional square lattice model, residue pairs separated by even numbers of residues [type 1, i.e., (i, i + 3 ) , (i, i + 5 ) , (i, i + 7), and (i, i + 9) pairs] could be in direct contact in the lattice, and it was therefore not surprising t o see significant cooperativity between these residue pairs in affecting the entropy of the model polypeptide. O n the other hand, residue pairs separated by odd numbers of residues [type 2, i.e., (i, i + 4), (2, i + 6), (i, i + 8), pairs] could not be in direct contact in the lattice and thus could not contribute to the total hydrophobic energy of the model polypeptide. These pairs did not show prominent cooperativity at low hydrophobic energies. However, as the hydrophobic interaction energy parameter chh was increased t o -0.5 k,T, type 2 residue pairs gave significant positive cooperativity. As the hydrophobic energy parameter was increased further to -1 k,T, type 1 residue pairs gave positive cooperative effects instead of negative ones observed at low hydrophobic energies. When the hydrophobic energy parameter was
Figure 7 Scaled second-order sensitivity (d2Sldsidsi)sisi of the entropy of a 10-residue model polypeptide for five values of shh: (a) 0.01 x,’(b)0.5 X, ( c )1.0 X, (d) 2.0 X, and ( e )5.0 X k,T
306 SensitivityAnalysis in Biomolecular Simulation
around 2 k,T, all the residue pairs showed positive cooperativity with comparable magnitudes. At even higher hydrophobic energies (near 5 k,T), most type 1 residue pairs became less cooperative than type 2 residue pairs in affecting the entropy of the model polypeptide. Therefore, the simple two-dimensional square lattice model of protein folding, which was composed of a homogeneous sequence of hydrophobic residues equal in hydrophobicity, exhibited cooperative effects highly dependent on the locations of the residues in the sequence, the hydrophobic energy, and the temperature. More importantly, even residue pairs that had no direct interaction energies could show cooperative/anticooperative effects in influencing system properties. This finding is consistent with the experimental observation that the free energy changes due to multiple mutations could sum nonadditively from the free energy changes due to single mutations even though the residues that were mutated were separated by large distances and therefore had negligible direct interaction energies.4749 Sensitivity coefficients of second and higher order measure nonadditive effects. Consider the expression
u = C-AE~ as +1- ~ C -a2-s- A E ~ + .A .. E ~ aEi
2
.
I
aE,a&,
for estimating the entropy change AS of a model polypeptide. If all the sensitivity coefficients of second and higher order are zero, only the linear terms remain, and the total entropy change can be expressed as a sum of the entropy changes resulting from single “mutations.” The same argument applies to the study of free energy and internal energy changes. Conversely, our findings suggest the need to be careful when double mutation experiments are used to probe residue pair interactions in proteins. The study of the two-dimensional square lattice model clearly demonstrates a counterexample in which cooperative effects between two residue pairs can occur even though there is no direct interaction between the two residue pairs. Therefore, results from double mutation experimentsSo may not necessarily reflect residue pair interactions. This simple model of protein folding was provided to show that it is not always trivial to identify the determinants of the properties of (bio)molecular systems. The complexity of the problem can increase further for more complicated and realistic models of (bio)molecular systems. Sensitivity analysis should therefore be a useful tool for sorting out the significant factors that determine the properties of these complex systems.
Molecular Recognition Just as it may not be trivial to identify the determinants of the properties of complex biological systems, it is also not necessarily straightforward to suggest effective modifications of (bio)molecules needed to achieve desired bio-
Applications 307 logical effects. Sensitivity analysis might be a useful tool for guiding the design of novel bioactive molecules. A few examples of sensitivity analysis relating to molecular design have been published. 8~51-53 In these examples, no bioactive compounds were actually designed, but the process of association of molecular systems was modeled. Cieplak et al.52 examined the use of free energy derivative calculations to suggest what types of cation may be bound most strongly by 18-crown-6. In that work, the authors calculated sensitivity coefficients of the form dAG/dAi in which A G is a binding free energy of a cation by 18-crown-6, and X i is a nonbonded interaction parameter. They found that values of dAGldRI, where Rr is the atomic radius of a cation, were -10.9 kcall(mol.A), 2.7 kcall(mol.A), and 4.5 kcall(mo1-A) for Na+, K', and Rb', respectively. The negative sign of dAG/dR: for Na' suggests that a larger cation than Na' may bind more strongly to 18-crown-6, and the positive sign of dAGlaR," for K' and Rb' suggests that a cation smaller than K' and Rb' may bind more favorably to 18-crown-6. Recalling that the size of these ions increases in the order Na' < K+ < Rb', and that the charges of these ions are similar, these results suggest that a fictional ion of size between that of Na' and K' is optimal for binding to 18-crown-6. However, the magnitude of aAGlaR; for Na' and K' suggests that the size of this optimal ion is closer to K' than to Na'; this result is consistent with the experimental finding that 18-crown-6 binds more tightly to K' than to Na' or Rb'. By calculating values of dAG/dA, involving the nonbonded parameters of 18-crown-6, these authors also speculated that suitable modifications to 18crown-6 that would increase the negative charges of the crown ether oxygens might improve the binding to K'. Because first-order sensitivity coefficients are easier to calculate than higher order sensitivity coefficients, it is likely that the former may be used more frequently in guiding molecular design. However, first-order sensitivity theory can provide reliable predictions only when the sensitivities of the properties of interest are approximately linear with respect to the model parameters. This linear response limit is satisfied when the perturbations of model parameters are small. For certain applications, such as in protein engineering where one amino acid is mutated into another, the linear response approximation may fail to reliably predict the change in the properties of a protein resulting from a point mutation. It is therefore useful to examine in more detail how well first-order sensitivity theory performs in guiding such predictions. The two-dimensional square lattice protein folding model discussed earlier provides a simple basis for probing this issue. The model has the advantage of allowing one to carry out many exact calculations to check the predictions from first-order sensitivity theory. Unlike molecular dynamics or Monte Carlo simulations, there are no statistical errors or convergence problems associated with the calculations of the properties, and their parametric derivatives, of a model polypeptide on a two-dimensional square lattice. Starting from many 10-residue model polypeptides with different sequences, Bleil et a1.I8 made different mutations (corresponding to changing a
308 SensitivityAnalysis in Biomolecular Simulation
hydrophobic residue into a less hydrophobic or a hydrophilic residue, or to changing a hydrophilic residue into a hydrophobic one) and compared the exact results obtained by making explicit mutations and the approximate results obtained from first-order sensitivity theory. In the exact calculations, the change for a property of a model polypeptide was obtained by computing A 0 = On,, - Ooldin which Onewand Ooldwere the property calculated after and before mutation, respectively. When first-order sensitivity theory was used, A 0 was calculated by using A 0 = ( dO/&t)Aezin which et is the hydrophobic energy of residue i defined earlier, and Aef is the change of the value of g2due to a mutation. The following 13 observables 0 were included in that study: free energy, entropy, internal energy, equilibrium constants between different classes of conformations characterized by having different number of hydrophobic contacts, averaged compactness, averaged number of hydrophobic-hydrophobic interactions, averaged number of bends, averaged number of hydrophobic-hydrophobic interactions per bend, and averaged number of buried residues that were hydrophobic. Table 3 summarizes the ability of first-order sensitivity theory to correctly predict the direction of change due to the mutations. It is clear from the data in Table 3 that first-order sensitivity theory works best when e f and hef are both small. When el and Aezare of the order of 2 k,T, the predictive reliability decreased to -75%. Therefore, first-order sensitivity theory does not always give correct predictions. However, since first-order sensitivity coefficients can usually be calculated more easily than higher order sensitivity coefficients in (bio)molecularsimulations, first-order sensitivity coefficients can be used as a preliminary screening tool for suggesting a small number of modifications to a (bio)molecule that may lead to the desired biological effect. More sophisticated (but usually more expensive) calculations and/or suitable experimental studies can then be carried out to sort out from this small number of suggestions those that are more likely to achieve the desired biological effects. If experimentation is easier, the predictions can be tested in the laboratory. An obvious extension of first-order sensitivity theory is to develop higher order theories utilizing higher order sensitivity coefficients. For example, some Table 3. Predicted Results of Mutations for a Two-Dimensional Square Lattice Model of Protein Folding Using First-Order Sensitivity Theorya E*
(k,T)
0.1 1 2 2
A&.
Correct
0.1 1 2 1
133 128 119 152
-
532
aData from Ref. 18.
Incorrect
% Correct
14 25 36 56
90 84 77 73
131
80Tva)
-
Applications 309 investigators have considered the Gaussian-type approximations employed in free energy c a l c u l a t i ~ n s . Specific ~ ~ - ~ ~ applications include estimating the pKa’s of acidic and basic residues of proteinsss and of the excitation energies of chromophores.s6 The Gaussian-type approximations can be conveniently arrived at from Zwanzig’s statistical mechanical perturbation theory,57 the derivation of which follows. The free energy difference AA between system b and system a can be written as follows:
in which A, and Aa are the free energies of system b and system a, respectively,
H , and Haare the classical Hamiltonians of system b and system a, respectively, r denotes phase space variables (atomic coordinates and their conjugate momenta), and p = k,T where kB is the Boltzmann constant and T is the absolute temperature. This equation can be further written as follows:
where AH
=
H, - Ha.By definition,
where the last term denotes an average over an ensemble of system a. Therefore
which is the perturbation formula first derived by Zwanzig in 1954.57The exponential and logarithmic terms of Eq. [31] can be expanded in a Taylor series of AH, and the equation M=(AH),+-((AH-(AH),) 1
2
)
2kBT
can be obtained if one keeps only terms up to second order in AH. This “Gaussian approximation” works well for a number of applications such as the calculation of pKa of acidic and basic residues of proteinsss and the calculation of the solvation contributions to the excitation energies of tryptophan.56 When AH is dominated by contributions from electrostatic interactions, and when
31 0 Sensitivity Analysis in Biomolecular Simulation
these interactions are modeled by Coulomb's law, A H can be written in a very simple form
in which Vcou, is the electrostatic potential of system a, qi is the partial charge of atom i of system a, and qi + Aqi is the corresponding charge in system 6. Putting this Coulombic form of A H into the Gaussian formula (Eq. 1351) for calculating free energy changes, one can show that some second-order, thirdorder, and fourth-order effects in Aq, are included by the Gaussian approximation formula for calculating free energy changes. Instead of including nonlinear effects explicitly in predicting free energy changes, one can include nonlinear effects implicitly by using semiempirical linear response theories. Aqvist and co-workers examined a special case of semiempirical linear response theory by studying the binding energies of a number . ~ ~this application, the binding of inhibitors to the protein e n d o t h i a p e p ~ i n In energies of inhibitors to the protein were assumed to be linearly dependent on the averaged inhibitor-protein interaction energies and the averaged inhibitor-solvent interaction energies. Therefore, for the binding process inhibitor
+ protein
+
inhibitor-protein complex
with associated binding free energy change, AA, the following linear-responsetype formula was used to approximate AA = a( @ktrostatics)
- @ktrostatics))
(.
i-@Lennard-Jones)
- @tennard-Joner
I371
where ('Lxrrosratics ) and ( ULennard-Jones) are, respectively, the averaged inhibitor-protein electrostatic interaction energy and the averaged Lennard-Jones ) interaction energy in a solvated inhibitor-protein simulation; ( U~lectrosrarics are, respectively, the averaged inhibitor-solvent electrostaand (ULennard.Jones) tic interaction energy and the averaged Lennard-Jones interaction energy in an inhibitor-solvent simulation; and ci and y are two empirical parameters obtained by fitting Eq. [37] to a set of experimental data. Once the parameters 01 and y of Eq. 1371 have been determined, this relation may be used to predict the binding energies of other similar inhibitors to the protein. In the work of Aqvist et a1.,58 ci was taken to be 0.5, which has an approximate theoretical b a ~ i s . No ~ ~ firm . ~ ~theoretical framework has yet been worked out to guide the choice of y, so this parameter was treated as an empirical parameter. Promising results for the binding energies of a number of inhibitors to the protein endothiapepsin were obtained by these authors.58 This
Applications 31 1
semiempirical linear response theory was later extended by Carlson and JorgensenG1to calculate the hydration free energies of a number of organic molecules. Carlson and Jorgensen also treated a as an empirical parameter, and they added an extra term that is proportional to the accessible surface area of a solute molecule. Again, encouraging results were obtained in this application. In addition, the method of Aqvist et al. was later employed successfully by Paulsen and Ornsteid2 to study the binding of 11 compounds to cytochrome P-450. Because the semiempirical linear response theory appeared to be so successful in several applications, it is useful to think about what can be done to further improve this theory. The addition of accessible surface area terms by Carlson and Jorgensen61 improved the flexibility of the theory, so it is interesting to ask whether one can use a microscopic description for the effects represented by the accessible surface area terms. These terms probably reflect contributions from hydrophobic effects, which may be described by the solvent-solvent interaction energy. In fact, when carrying out sensitivity analysis on several terminally blocked amino acids in methanol, we found that the solvent-solvent interaction energy was altered by the dissolution of a solute (Table 4). A comparison of the charge sensitivities of the Helmholtz free energy of liquid methanol with those obtained from solutions of glycine, threonine, and serine dipeptides in methanol shows that the dissolution of these solutes has all altered the solvent-solvent interaction energies. Therefore, perhaps one could replace the accessible surface area terms by terms involving solvent-solvent interaction energies in the solutions of the protein, the inhibitor, and the protein-inhibitor complex. One might also improve the semiempirical linear response theory by adding an intrasolute interaction term. However, adding this term will increase the number of empirical parameters by one. The use of this extra term is practical only when sufficient experimental data are available and enough simulations are done to allow the determination of the extra parameter. Other possi-
Table 4. Charge Sensitivity of the Helmholtz Free Energy of Liquid Methanol, and Solutions of Glycine (G), Threonine (T), and Serine (S) Dipeptides in Methanol" Number of aAlaq,, aA/ a4,, a%, Methanols (kJ/mol.esu) (kJ/mol.esu) (kJ/mol.esu) in Simulation Methanol 10 -27 -106 216 17 -42 -173 196 G in methanol T in methanol 16 -42 -167 230 15 -42 -167 230 S in methanol "q,,, qcm,and qhmare, respectively,the partial charges of the oxygen, the methyl group, and the hydroxyl hydrogen of a methanol molecule in the OPLS force field (Ref. 80). In the simulations, the OPLS parameters were used for methanol, and the GROMOS (Ref. 16) parameters were used for the dipeptides.
3 12 Sensitivity Analysis in Biomolecular Simulation
ble improvements might include the use of nonlinear scaling relationships. For example, the use of the functional form
where p and 6 are additional empirical parameters, may improve the performance of the semiempirical theory, but two more adjustable parameters would need to be fit to available experimental data.
Green’s FunctiodPrincipal Component Analysis and Essential Dynamics The idea of the Green’s functiodprincipal component analysis is closely related to the essential dynamics a p p r ~ a c h recently ~ ~ - ~ ~introduced into biomolecular simulations. Other similar works include those by Garcia,67 Ichiye and Karplus,68 G6 and coworker^,^^-^^ and developers of the quasi-harmonic method.26-28 The basic idea of the essential dynamics approach is to diagonalize a covariance matrix cr whose elements are given by the formula
1391 in which C, is the positional correlation matrix of Eq. [Zl].If U is the matrix that diagonalizes cr such that
UTaU = D
[401
and D is a diagonal matrix, cr can be written in the form
i=l
in which Dii is a diagonal element of D, and Gl is the eigenvector represented by the ith column of U. Therefore, the positional covariance matrix cr can be written as a sum of N matrices, where N is the dimension of the matrix cr. If one arranges Dii in decreasing order of magnitude, Q can be written as a sum of terms with decreasing contributions to Q. This way, one can identify from the leading terms of Eq. [39] the most important principal components that determine the atomic positional fluctuation of a (bio)molecule. For biomolecules, a few principal components usually dominate the contributions to cr. This is because the atomic positional fluctuations are dominated by contributions from
Applications 3 13 a small number of large-amplitude, low-frequency collective modes of the biomolecule. The higher frequency modes introduce only small-amplitude, local atomic fluctuations that are approximately harmonic. The large-amplitude collective motions of certain biomolecules have long been thought to play important functional roles.’ Previously, these large-amplitude biomolecular motions had been studied mostly by normal and quasi-harmonic analyses.26-28 A quasi-harmonic analysis is a special form of normal mode analysis in which the second-order derivatives of the potential energy (which form the elements of a Hessian matrix) are replaced by an effective force constant (or Hessian) matrix constructed from a covariance matrix of positional fluctuations obtained from a molecular dynamics simulation. The use of such an effective force constant (or Hessian) matrix can account for some anharmonic effects that a standard normal mode analysis neglects. From the Green’s function described by Eq. [22], it is easy to see that an effective Hessian matrix can be constructed from the inverse of a Green’s function matrix that is related to a covariance matrix of positional fluctuations. (Remember that a component of a force acting on an atom can be obtained as the negative of a potential gradient.) A key reason for developing the Green’s functiodprincipal component analysis approach is to study structural responses of biomolecules due to perturbations introduced to different parts of the biomolecules. For this application, one works with a Green’s function matrix directly rather than using it to construct an effective force constant matrix for a quasi-harmonic analysis. If one diagonalizes a Green’s function matrix, one can also study collective structural responses as in a normal mode or quasi-harmonic analysis. A Green’s function analysis offers the additional advantage of allowing one to introduce explicit external perturbations (through the calculation of d f ,of Eq. [22])to study the structural responses introduced by these perturbations. For example, df,can be calculated from the interaction potential between the averaged structure of a biomolecule and its ligand; Eq. [22] then allows one to use the results from a molecular dynamics or Monte Carlo simulation of the unperturbed biomolecule to predict what structural responses the ligand may introduce to the biomolecule when the ligand binds to the biomolecule. Relative to the essential dynamics approach, the Green’s function method can provide a more realistic description of how biomolecular structure may respond when it interacts with its ligand(s).Because the effects due to the perturbation forces from a ligand are not accounted for, simply analyzing the collective modes of an isolated biomolecule and trying to make functional inferences from these unperturbed modes can sometimes be misleading. The Green’s function approach provides a first-order correction for studying the induction of structural changes to a biomolecule by one or more ligands without actually carrying out a simulation on the biomolecule-ligand( s) complex. If the eigenvectors GI of a Green’s function matrix corresponding to an un-
314 SensitivitvAnalvsis in Biomolecular Simulation complexed biomolecule have already been obtained, the structural response of the biomolecule to the binding of one or more ligands to the biomolecule can be obtained from Eqs. [22] and [39] as follows:
One can also use a Green’s function matrix directly (without diagonalization) with the relation
i
Ti
where Gi is a component of an atomic coordinate and is a component of a force acting on an atom of a biomolecule. The Green’s function approach can also be used to account for intramolecular motions in a Brownian dynamics simulation to study the diffusional-influenced reaction rates between two molecules such as an enzyme-substrate pair. It used to be that Brownian dynamics simulations of the diffusional encounters between biomolecules and their substrates were usually carried out by assuming the biomolecules and the ligands to be completely However, this is a very crude assumption: the approach of a ligand may dynamically distort the biomolecule to faciliate entry of the ligand to the active site of the biomolecule. This structural response of the biomolecule can be described by a Brownian dynamics simulation model in which the biomolecule is approximated by a collection of suitably connected spheres undergoing Brownian mot i ~ nThe . ~ Green’s ~ function approach provides an alternative whereby the results from a molecular dynamics simulation are paired with an explicit-solvent model to describe the dynamical structural response of the biomolecule to an approaching ligand during a Brownian dynamics simulation. In other words, the Green’s function matrix for a biomolecule is obtained from a molecular dynamics trajectory, and the forces of Eq. [42] or [43] are obtained in a Brownian dynamics simulation algorithm from the interaction potential between the biomolecule and its approaching ligand. The advantage of the Green’s function approach is that solvent effects on the structural fluctations of a biomolecule can be treated in a more realistic manner because an explicit-solvent model can be used. However, the Green’s function approach is applicable only when the structural response of a biomolecule depends approximately linearly on the perturbed forces introduced by the approaching ligand.
Error Propagation Sensitivity analysis also can be applied to examinations of error propagations that arise from the use of nonoptimal potential constants in empirical
Applications 315 force fields. This issue is less well studied in biomolecular simulations because it is expensive to repeat many biomolecular simulations with many different choices of potential parameters, as needed to evaluate the sensitivity of results to the uncertainties of potential energy parameters. However, one can gain some insights into this issue by calculating the derivatives of many propertiedobservables with respect to each parameter of a potential model; these derivatives can be calculated relatively easily because they are obtained by carrying out simulations on one reference system only. Once these derivatives are obtained, one can use a Taylor's series expansion A 0=
ao ax
-AXj + O([AX:]) j
to estimate the effects on simulation results when the force field potential constants are modified. In Eq. [44],0 is a simulated property, Ahi is the change of the value of the force field parameter A, from its value in the force field F , with which a simulation is done to a value in a different force field F , , and A 0 provides an estimate of the change of the value of 0 when the force field is changed from F, to F , . This approach was used to study the sensitivity of the calculation of the free energy difference between solutions of serine and threonine dipeptides in methanol when the atomic charges of these molecules were changed from those in the GROMOS force fieldx6 (which was used in the simulation) to those of the AMBER2' and 0PLSgo force fields. In this study, it was found that the absolute free energy of each system could be quite sensitive to the choice of atomic partial charges, but the differences of the free energies between these two solutions were less sensitive, suggesting the occurrence of cancellation of errors in free energy difference calculations. Upon changing from the GROMOS atomic partial charges to the AMBER atomic partial charges, for example, the free energy of the serine dipeptide was changed by 81.9 kJ/mol and that of threonine dipeptide was altered by 78.5 kJ/mol (both were estimated by using the firstorder approximation of Eq. [44]). The change in the free energy difference between these two systems is only 3.4 kJ/mol. A complete analysis should include in Eq. [44] other force field parameters, because there could exist correlations among different potential energy parameters when they were determined from fittings to suitable theoretical and/or experimental data of model compounds. The sensitivity of each absolute free energy to changes of force field parameters might therefore be smaller than in the example above if other force field parameters were included in the analysis. But, one cannot always rely on adjusting the parameters from one type of potential energy term to compensate for any deficiency in determining the parameters of another potential energy term. For example, the nature of the interactions described by Coulombic terms and Lennard-Jones terms differ quite markedly. If the atomic partial charges employed in Coulombic terms are not
31 6 SensitivitvAnalvsis in Biomolecular Simulation adequate for describing the electrostatic properties of a system, one cannot always rely on adjusting the Lennard-Jones parameters to make up for this deficiency. This argument is further accentuated when one wants to reliably describe additional properties of a system from a simulation model. The more properties one wants to properly predict, the less flexibility one has in adjusting the parameters of potential energy terms of one type to compensate for the inappropriate determination of the parameters of potential energy terms of other types. An advantage of calculating sensitivity coefficients is that it can help identify the parameters that are most responsible for a group of system properties. This information can be obtained by examining the sensitivity coefficients and identifying those having the largest magnitudes. One can also employ the principal component analysis technique,13 so that the effects due to the correlations among potential parameters can be accounted for. It is useful to consider the same simple example of computing the free energy difference between serine and threonine dipeptides in methanol to illustrate this. We carried out a principal component analysis on a sensitivity matrix S containing the matrix elements dOl/dAz, where 0, is a calculated free energy of a serine or a threonine dipeptide in methanol and At is an atomic partial charge of the solute. By carrying out a singular value decomposition on S as described by Eq. [24] with M = S, one obtains the eigenvalues of the matrix D and the eigenvectors contained in the matrices U and V. The number of nonzero eigenvalues was the smaller of the dimension of the sensitivity matrix S-the dimension of the matrix was determined by the number of systems simulated (two in this example)-and the number of different parameters considered in the analysis (16 in this example; they were the atomic partial charges associated with the 16 atomdextended atoms shown in Table 5, footnote a ) . There were thus at most two principal components in this example. To understand the insights that each principal component provided, one could use Eq. [25] to write dol and d d 2 in the following form:
d o 2 = U2,1Dl,lFdi+ U2,2D2,2Edi where Viis an eigenvector contained in the matrix V. Only the eigenvectors corresponding to the two major principal components need to be considered in this example because only these two eigenvectors were associated with nonzero eigenvalues. The first terms of Eqs. 1451 arose from the first principal component (associated with the largest eigenvalue), and the second terms of Eqs. [45] originated from the second principal component (associated with the second largest eigenvalue). For the first principal component, Ul,l = -0.74 and U2 = -0.67 were nearly the same. Accordingly, the first terms produced approd-
Applications 31 7 Table 5. Principal Component Analysis of the Free Energies of Serine and Threonine Dipeptides in Methanol at 300 K" D,,l U,,l U2,,
=
1363
= =
-0.74 -0.67
V,,l
-0.17 v2,, 0.01 V,,, -0.27 V,,, = 0.50 V,,, = 0.43 V6,, = -0.17 V,,, = 0.06 v*,, = 0.05 V9,1 = 0.18 v,,,, = -0.10 V,,,, = -0.06 V,,,, = -0.24 v13,1= 0.10 V,,,, = 0.44 V,,,, = 0.33 v,,,, = 0.05 = = =
D,,, U,,, U,,,
= = =
197 -0.67 0.74
Vl,,
= =
V,,,
=
v,,, v,,,
0.04 0.05
0.03
0.05 VS,, = -0.04 V6,, = -0.02 V,,, = 0.38 V8,, = 0.02 v,, = 0.02 V,,,, = 0.80 V,,,, = -0.03 V,,,, = -0.19 v,,,, = -0.01 V,,,, = 0.09 v,,,, = -0.01 V,,,, = -0.40 =
"D,,, are the diagonal elements of the matrix D in Eq. [24]. U,,, is a matrix element of U, where i labels one of the two solutions and j labels the jth principal components. Vi,; is a matrix element of V, where i labels one of the following atoms/extended atoms of the two dipeptides: 1 = CH,, 2 = C , 3 = 0 , 4 = N , 5 = H , 6 = C m , 7 = C,(S),8 = O y , 9 = H , 10 = Cy, 11 = C, 12 = 0 , 1 3 = N, 14 = H, 15 = CH,, 16 = C,(T)and j labels a principal component.
mately the same effect on the free energy of the two solutions (recall from Eqs. [26] that the vectors in the matrix U are associated with the observables or simulated properties). Thus, the first principal component is a general measure of the total free energy sensitivities of the two systems. O n the other hand, U , , = -0.67 and U,, = 0.74 of the second principal component had about the same magnitude Lut different signs. Accordingly, the second principal component describes features that accounted for the differences of the free energy of the two solutions. By examining the eigenvector of V for the second principal component, one could identify the atomic partial charges that were most crucial in determining the free energy difference of the two solutions. The largest component of this eigenvector was associated with the atomic partial charge of the y carbon (in the extended atom representation) of the threonine dipeptide. This result suggested that the free energy difference of the two solutions was largely accounted for by atoms in or near the y-methyl group of the threonine dipeptide. Consequently, the uncertainty of the free energy difference was solely determined by the uncertainty of the potential parameters associated with atoms near the y-methyl group of the threonine dipeptide. The uncertainty of the pa-
31 8 Sensitivity Analysis in Biomolecular Simulation rameters associated with the other atoms of the solutes might not affect the calculated free energy difference of the two solutions very much. This principal component analysis further demonstrates how cancellation of errors can occur in free energy difference calculations. Although the parameters employed in free energy calculations may not be optimal, cancellations of errors can occur when the free energy difference between two similar systems is calculated as described above. This is because the uncertainties of many parameters produce comparable effects on the free energy of two similar systems. This example also illustrates why it is easier to calculate the difference between two free energy changes rather than the free energy changes individually. However, as the systems become more different, more parameters may become significant in determining the free energy difference between the two systems. This subset of “essential” parameters must still be sufficiently reliable to give a proper estimate of the free energy difference between the two systems. Principal component analysis can also be carried out by using simulation data obtained at different conditions, such as at different temperatures, so that more observables can be used to construct a larger sensitivity matrix. This has been done for the evaluation of serine and threonine dipeptides in methanol,29 but the key findings were essentially the same as those described above when the results from only two simulations were used in the analysis.
Potential Energy Function Refinement Sensitivity analysis is also a tool that can help to refine potential energy functions for (bio)molecular simulations. Sensitivity analysis can help one decide whether a specific feature needs to be included in a potential function for describing a specified set of properties of a given class of molecules. For example, because point charge models are commonly used in bio(molecu1ar) modeling, it is useful to inquire whether a dispersed charge representation would improve the description of intra- and intermolecular electrostatic interactions. One study of this type was carried out by Zhu and Wong,4O who included in the force field a squared Lorentzian function f(? - jk)of the form
to describe the smearing out of charges about the oxygens and hydrogens of
water molecules in the simulation of liquid water. In Eq. [46], Tk is the coordinate vector of a water oxygen or hydrogen, and a is a parameter that controls the width of the charge dispersion. In a simulation of several properties of liquid water (internal energy, pressure, isothermal compressibility, Kirkwood factor, radial distribution functions, and distribution function of the interaction energy of a water molecule with its surrounding) using a polarizable water model with this artificially dispersed charge representation, the sensitivity of the
Applications 3 19 properties to perturbations of a, measured by first-order, log-normalized sensitivity coefficients, were two orders of magnitude smaller than the perturbations of the parameters that gave the largest sensitivity of these proper tie^.^^ This finding suggests that a point charge representation is adequate for describing many properties of liquid water. The use of a more expensive dispersed charge representation was not crucial for describing the properties of liquid water discussed above. This example illustrates how sensitivity analysis can help to simplify a potential model for (bio)molecular simulations. The sensitivity coefficients obtained from a sensitivity analysis can also help guide the optimization of the parameters of an empirical force field to best fit a given set of experimentaUtheoretica1 data. In fact, first-order sensitivity coefficients are quantities that are typically calculated in least-squares refinement programs for optimizing the force field parameters to best-fit a set of experimental/theoretical data. Is The difference between a brute force least-squares refinement of model parameters and a sensitivity analysis is that the latter analyzes the informational content provided by these sensitivity coefficients, thereby helping to refine a set of parameters more intelligently. For example, sensitivity coefficients with small absolute magnitudes identify parameters that may not be readily refined by a given set of experimentaYtheoretica1 data. Other suitable experimentalhheoretical data must be included before these parameters can be readily determined. The inclusion of these poorly determined parameters in a parameter refinement process may deteriorate the determination of the other parameters. An example of parameter optimization is found in the determination of atomic partial charges.81 A popular technique currently in use is to derive the atomic partial charges of a molecule by fitting these charges to the quantum mechanical electrostatic potential calculated at a number of points around the m ~ l e c u l e . ~Partial ~ - ~ ~charges of atoms buried inside the molecule are usually less well defined than for atoms that are exposed because the electrostatic potential is usually calculated at points outside the van der Waals surface of the molecule. (Within the van der Waals surface, quantum effects are also significant; classical electrostatics is inadequate for describing the interactions between two atoms that are closer than the sum of their van der Waals radii.) When no additional appropriate data are provided for determining the atomic partial charges, it has been found that imposing constraints on the buried charges to keep them close to some physically meaningful values can help obtain a more reasonable set of charges for the molecule. The charges derived in this way can be more readily transferred to other (similar) molecules, and they better describe intramolecular electrostatic interactions.81 Correlations among parameters and observables can complicate a parameter refinement process. There may be insufficient data to determine N parameters of a force field given N experimental or theoretical data points if there exist significant correlations among the potential constants or among the data used for determining these parameters. A principal component analysis can
320 Sensitivitv Analvsis in Biomolecular Simulation analyze such correlation behavior and reveal how many useful relations are really provided by a set of experimentaI/theoretical data. To help illustrate this application, it is again useful to take the same simple example above for evaluating the free energy of the serine and threonine dipeptides in methanol.29 In this case, the principal component analysis gave only two prinicpal components that were associated with nonzero eigenvalues (or singular values), indicating that two potential parameters at most could be determined by using the free energies of the two solutions. However, the first eigenvalue was almost an order of magnitude larger than the second eigenvalue, suggesting that the first principal component was significantly higher in informational content than the second with respect to determining the atomic partial charges of the two peptides. When this analysis was extended to include nine solutions of the two solutes in two different solvents (methanol and water) at different temperatures, the informational content did not increase much.29 Adding seven more sets of data for assisting the parameter refinement increased the number of useful principal components only by approximately one, and the total number of principal components associated with eigenvalues of significant magnitudes was only three. Therefore, from a sensitivity analysis and a principal component analysis, one can gain insights into how many useful relations can be derived from a given set of experimentalkheoretical data for refining force field parameters. These analyses can also be useful in the selection of suitable experimental/theoretical data to use for force field parameterization. Ideally, one would like to include the smallest amount of data containing the largest amount of information; the judicious choice of experimental/theoretical data needed to accomplish this can help reduce the computational costs of refining a set of potential parameters. Similarly, due to the correlation of potential parameters, N relationships may not readily determine N potential parameters. For the example above of serine and threonine dipeptides in methan01,2~the two relationships provided by the two sets of data involved more than two force field parameters, as suggested by the many components of the eigenvectors having significant magnitudes in V. If only two coefficients were nonzero for the two eigenvectors, these two relationships could have readily determined the two parameters. Because more than two coefficients were nonzero, there exist infinite combinations of parameters that could give the same free energy values for the two solutions (assuming the free energies of the two solutions are the only data available for determining these two parameters).This phenomenon is ubiquitous in bio(mo1ecular) modeling. Different force fields with a similar functional form commonly have different values of analogous parameters. Even so, similar values of certain selected properties can often be obtained by those different force fields. The possibility exists of having many possible combinations of parameters that can describe certain properties with comparable accuracies, and it is consistent with
Conclusions 321 the principal component analysis of dipeptides discussed above, i.e., that different parameter sets may give similar values for a subset of bio(molecu1ar) properties. However, the more properties one requires a force field to describe properly, the more one needs to choose the parameters carefully, because it becomes less likely that adjusting one or more of those parameters will serve to compensate for the deficiency of a poorly determined parameter. Nevertheless, for many applications, it is probably inevitable and adequate to develop ad hoc problem-specific force fields. Force fields must be relatively simple and computationally efficient for studying complex macromolecules such as proteins and DNAs. The force fields usually describe properties of certain types better than others, depending on how the force fields were developed. We have already learned from the sensitivity analysis studies of liquid water and a two-dimensional square lattice model of protein folding that different system properties can be determined by different features of a potential model. An example employing a more realistic force field can also be found in the application of sensitivity analysis to study the determinants of the structural and thermodynamical properties of the protein avian pancreatic polypeptide (APP).19It was found that the size and shape of the protein was determined to a large extent by electrostatic interactions, whereas the free energy of the protein was more sensitive to the surface-area-dependent solvation energy terms that modeled hydrophobic effects. Consequently, it is possible to develop an ad hoc force field that is designed to describe certain classes of (bio)molecular properties properly. The failure of such an ad hoc force field to describe properties of other types does not necessarily indicate that this force field is useless, rather, caution should be exercised in any attempt to apply it to other properties.
CONCLUSIONS In this chapter, we summarized some of the recent developments in sensitivity analysis approach for biomolecular simulations. Although more work needs to be done to exploit the full capability of the sensitivity analysis approach, the initial applications of this technique have already generated many useful insights for enhancing biomolecular simulations and improving models for carrying them out. The sensitivity analysis approach is an efficient and effective method for systematically identifying the determinants of interesting hiomolecular properties, which can be difficult to identify by intuition alone. Although first-order sensitivity theory is not always reliable in predicting the properties of a structurally modified (bio)molecule, it may be useful as a preliminary classification tool for suggesting a small number of modifications
322 Sensitivity Analysis in Biomolecular Simulation
that can be further exploited with more sophisticated (but more expensive) computational simulations or/and experimental studies. The advantage of firstorder sensitivity theory is that it is relatively inexpensive to use. The reliability of sensitivity analysis in the design of novel bioactive compounds can be improved by employing higher order sensitivity theory, with the Gaussian-type app r o x i r n a t i ~ n ~as~a- successful ~~ example. The encouraging preliminary applications of semiempirical linear response t h e ~ r i e s ” ~ , to ~ ’ predicting ,~~ free energy changes should also fuel further research on exploiting the full capability of this approach. The molecular dynamics/Monte Carlo Green’s function approach,24 which is an extension of the molecular mechanics Green’s function app r o a ~ h and ~ ~is, a~special ~ form of sensitivity analysis, is tightly connected to the essential dynamics method introduced recently for studying the possible ~ - advantage ~~ of the functional roles of collective modes of b i o r n o l e ~ u l e s . ~An Green’s function approach is that the effects arising from the introduction of perturbations to a (bio)molecule by its interacting partners can be explicitly included to predict how these perturbations may affect the structure of the (bio)molecule. The sensitivity analysis approach has also been shown to be useful for studying error propagations due to the use of nonoptimal parameters in biomolecular simulations and for examining how error cancellations may occur in free energy difference calculation^.^^^^^ The sensitivity analysis approach can also suggest how potential functions could be simplified and how the parameters of these functions can be effectively refined. Although more work needs to be carried out to fully examine the utility and limitations of the sensitivity analysis approach in (bio)molecular modeling, this methodology has already produced useful insights into the determinants of (bio)rnolecular properties that are difficult to obtain by intuition alone. A key strength of this approach is its ability to examine in an efficient manner many possible factors that may determine a set of (bio)molecular properties. It will be interesting to see how the sensitivity analysis approach can be used with other computational and experimental techniques to gain even deeper insights into the determining factors.
ACKNOWLEDGMENTS Some of the research described in this chapter was supported by the Petroleum Research Fund administered by the American Chemical Society, the National Institutes of Health, the Office of Naval Research, and the Bristol-Meyer Squibb Institute for Medical Research. Work carried out in our laboratories involved a number of collaborators: Richard E. Bleil, Axel Briinger, Gauri Misra, Robert B. Nachbar Jr., Clarence Schutt, Tom Simonson, Roberta Susnow, Qiang Wang, Hong Zhang, and Sheng-bai Zhu.
References 323
REFERENCES 1. J. A. McCammon and S. C. Harvey, Dynamics ofProteins and Nucleic Acids, Cambridge University Press, Cambridge, 1987.
2. C. L. Brooks 111, M. Karplus, and B. M. Pettitt, Proteins: A Theoretical Perspective of Dynamics, Structure, and Thermodynamics, Wiley, New York, 1988. 3. T. P. Lybrand, in Reviews in Computational Chemistry, K. B. Lipkowitz and D. B. Boyd, Eds., VCH Publishers, New York, 1990, Vol. 1, pp. 295-320. Computer Simulation of Biomolecular Systems Using Molecular Dynamics and Free Energy Perturbation Methods. 4. A. E. Torda and W. F. van Gunstersen, in Reviews in Computational Chemistry, K. B. Lipkowitz and D. B. Boyd, Eds., VCH Publishers, New York, 1992, Vol. 3, pp. 143-172. Molecular Modeling Using NMR Data. 5. T. P. Straatsma, in Reviews in Computational Chemistry, K. B. Lipkowitz and D. B. Boyd, Eds., VCH Publishers, New York, 1996, Vol. 9, pp. 81-127. Free Energy by Molecular Simulation. 6. R. Tomovick and M. Vukobratovic, General Sensitivity Theory, American Elsevier, New York, 1972. 7. P. Franck, Introduction to System Sensitivity Theory, Academic Press, New York, 1987. 8. L. Eno and H. Rabitz, Adv. Chem. Phys., 51, 177 (1982).Sensitivity Analysis and Its Role in Quantum Scattering Theory. 9. H. Rabitz, M. Kramer, and D. Dacol, Annu. Rev. Phys. Chem., 34, 419 (1983). Sensitivity Analysis in Chemical Kinetics. 10. H. Rabitz, Chem. Rev., 87, 101 (1987).Chemical Dynamics and Kinetics Phenomena as Revealed by Sensitivity Analysis Techniques. 11. H. Rabitz, Science, 246,221 (1989).System Analysis at the Molecular Scale. 12. G. E. Forsythe, M. A. Malcolm, A. Michael, and C. B. Moler, Computer Methods for Mathematical Computations, Prentice-Hall, Englewood Cliffs, NJ, 1977. 13. T. W. Anderson, An Introduction to Multivariate Statistical Analysis, Wiley, New York, 1958. 14. U. Dinur and A. T. Hagler, in Reviews in Computational Chemistry, K. B. Lipkowitz and D. B. Boyd, Eds., VCH Publishers, New York, 1991, Vol. 2, pp. 99-164. New Approaches to Empirical Force Fields. 15. J. P. Bowen and N. L. Allinger, in Reviews in Compwtatronal Chemistry, K. B. Lipkowitz and D. B. Boyd, Eds., VCH Publishers, New York, 1991, Vol. 2, pp. 81-97. Molecular Mechanics: The Art and Science of Parameterization. 16. W. F. van Gunsteren and H. J. C. Berendsen, GROMOS, Groningen, Netherlands, 1987. 17. S.-b. Zhu and C. F. Wong, J. Chem. Phys., 98, 8892 (1993). Sensitivity Analysis of Water Thermodynamics. 18. R. E. Bleil, C. F. Wong, and H. Rabitz,J. Phys. Chem., 99, 3379 (1995). Sensitivity Analysis of a Two-Dimensional Lattice Model of Protein Folding. 19. H. Zhang, C. F. Wong, T. Thacher, and H. Rabitz, Proteins: Struct., Funct., Genet., 23, 218 (1995). Parametric Sensitivity Analysis of Avian Pancreatic Polypeptide (APP). 20. B. R. Brooks, R. E. Bruccoleri, B. D. Olafson, D. J. States, S. Swaminathan, and M. Karplus, J. Comput. Chem., 4, 187 (1983). CHARMM: A Program for Macromolecular Energy, Minimization, and Dynamics Calculations. 21. S. J. Weiner, P. A. Kollman, D. A. Case, U. C. Singh, C. Ghio, G. Alagona, S. Profeta Jr., and P. Weiner, J. Am. Chem. SOL., 106, 765 (1984). A New Force Field for Molecular Mechanical Simulation of Nucleic Acids and Proteins. 22. R. Susnow, R. B. Nachbar Jr., C. Schutt, and H. Rabitz,J. Phys. Chem., 95,8585 (1991).Sensitivity of Molecular Structure to Intramolecular Potentials.
324 SensitivitvAnalvsis in Biomolecular Simulation 23. R. Susnow, R. B. Nachbar Jr., C. Schutt, and H. Rabitz, J . Phys. Chem., 95, 10662 (1991). Study of Amide Structure Through Sensitivity Analysis. 24. C. F. Wong, C. Zheng, J. Shen, J. A. McCammon, and P. G. Wolynes,]. Phys. Chem., 97,3100 (1993). Cytochrome c: A Molecular Proving Ground for Computer Simulations. 25. P. H. Hiinenberger, A. E. Mark, and W. F. van Gunsteren,]. Mol. Biol., 252,492 (1995).Fluctuation and Cross-Correlation Analysis of Protein Motions Observed in Nanosecond Molecular Dynamics Simulations. 26. M. Born and K. Huang, Dynamical Theory of Crystal Lattices, Clarenden Press, Oxford, 1954. 27. M. Karplus and J. N. Kushick, Macromolecules, 14, 325 (1981).Method for Estimating the Configurational Entropy of Macromolecules. 28. R. M. Levy, M. Karplus, J. Kushick, and D. Perahia, Macromolecules, 17,1370 (1984).Evaluation of the Configurational Entropy for Proteins: Application to Molecular Dynamics Simulations of an u-Helix. 29. C. F. Wong and H. Rabitz, J. Phys. Chem., 95, 9628 (1991).Sensitivity Analysis and Principal Component Analysis in Free Energy Calculations. 30. H. J. C. Berendsen, J. P. M. Postma, W. F. van Gunsteren, and J. Hermans, in Intermolecular Forces, B. Pullman, Ed., Reidel, Dordrecht, 1981, pp. 331ff. 31. W. L. Jorgensen, J. Chandrasekhar, J. D. Madura, R. W. Impey, and M. L. Klein, J. Chem. Phys., 79, 926 ( 1 983). Comparison of Simple Potential Functions for Simulating Liquid Water. 32. M. Sprik and M. L. Klein, ]. Chem. Phys., 89, 7556 (1988).A Polarizable Model for Water Using Distributed Charge Sites. 33. J. W. Halley, J. R. Rustad, and A. Rahman,]. Chem. Phys., 98,4110 (1993).A Polarizable, Dissociating Molecular Dynamics Model for Liquid Water. 34. D. N. Bernardo, Y. Ding, and K. Krogh-Jespersen, ]. Phys. Chem., 98, 4180 (1994). An Anisotropic Polarizable Water Model: Incorporation of All-Atom Polarizabilities into Molecular Mechanics Force Fields. 35. R. D. Mountain, ]. Chem. Phys., 103,3084 (1995).Comparison of a Fixed-Charge and a Polarizable Water Model. 36. I. M. Svishchev, P. G. Kusalik, and R. J. Boyd,]. Chem. Phys., 105,4742 (1996).Polarizable Point-Charge Model for Water: Results Under Normal and Extreme Conditions. 37. A. A. Chialvo and P. T. Cummings, ]. Chem. Phys., 105, 8274 (1996). Engineering a Simple Polarizable Model for the Molecular Simulation of Water Applicable over Wide Ranges of State Conditions. 38. S. Zhu, S. Yao, J. Zhu, S. Singh, and G. W. Robinson, J . Phys. Chem., 95, 6211 (1991).A Flexibleh'olarizable Simple Point Charge Water Model. 39. S.-b. Zhu, S. Singh, and G. W. Robinson, J. Chem. Phys., 95, 2791 (1991). A New Flexible/Polarizable Water Model. 40. S.-b. Zhu and C. F. Wong,]. Phys. Chem., 98,4695 (1994). Sensitivity Analysis of a Polarizable Water Model. 41. S.-b. Zhu and C. F. Wong,]. Chem. Phys., 99, 9047 (1993).Sensitivity Analysis of Distribution Functions of Liquid Water. 42. M. P. Allen and D. J. Tildesley, Computer Simulation ofliquids, Oxford University Press, Oxford, 1987. 43. W. Yu, C. F. Wong, and J. Zhang, /, Phys. Chem., 100, 15280 (1996).Brownian Dynamics Simulations of Polyalanine in Salt Solutions. 44. S. Huston and P. J. Rossky, 1.Phys. Chem., 93, 7888 (1989).Free Energies of Association for the Sodium-Dimethyl Phosphate Ion Pair in Aqueous Solution. 45. J. S. Bader and D. Chandler,]. Phys. Chem., 96,6423 (1992).Computer Simulation Study of the Mean Forces Between Ferrous and Ferric Ions in Water.
References 325 46. H. Schreiber and 0. Steinhauser, Biochemistry, 31, 5856 (1992). Cutoff Size Does Strongly Influence Molecular Dynamics Results on Solvated Polypeptides. 47. G. K. Ackers and F. R. Smith, Annu. Rev. Biochem., 54, 597 (1985). Effects of Site-Specific Amino Acid Modification on Protein Interactions and Biological Function. 48. S. M. Green and D. Shortle, Biochemistry, 32, 10131 (1993). Patterns of Nonadditivity Between Pairs of Stability Mutations in Staphylococcal Nuclease. 49. V. J. LiCata and G. K. Ackers, Biochemistry, 34,3133 (1995). Long-Range, Small Magnitude Nonadditivity of Mutational Effects in Proteins. 50. A. R. Fersht, A. Matouschek, and L. Serrano, J. Mol. Biol., 224, 771 (1992).The Folding of an Enzyme. I. Theory of Protein Engineering Analysis of Stability and Pathway of Protein Folding. 51. P. R. Gerber, A. E. Mark, and W. F. van Gunsteren, J. Cornput.-Aided Mol. Design, 7 , 305 (1993). An Approximate But Efficient Method to Calculate Free Energy Trends by Computer Simulation: Application to Dihydrofolate Reductase-Inhibitor Complexes. 52. P. Cieplak, D. A. Pearlman, and P. A. Kollman,]. Chem. Phys., 101,627 (1994). Walking on the Free Energy Hypersurface of the 18-Crown-6 Ion System Using Free Energy Derivatives. 53. P. Cieplak and P. A. Kollman, J. Mol. Recognit., 9, 103 (1996).A Technique to Study Molecular Recognition in Drug Design: Preliminary Application of Free Energy Derivatives to Inhibition of a Malarial Cysteine Protease. 54. R. M. Levy, M. Belhadj, and D. B. Kitchen, J . Chem. Phys., 95,3627 (1991). Gaussian Fluctuation Formula for Electrostatic Free-Energy Changes in Solution. 55. G. S. Del Buono, E. Freire, and R. M. Levy, Proteins: Struct., Funct., Genet., 20,85 (1994). Intrinsic pK,s of Ionizable Residues in Proteins: An Explicit Solvent Calculation for Lysozyme. 56. T. Simonson, C. F. Wong, and A. T. Brunger, J . Phys. Chem. A , 101, 1935 (1997). Classical and Quantum Simulations of Tryptophan in Solution. 57. R. W. Zwanzig,!. Chem. Phys., 22,1420 (1954). High-Temperature Equation of State by Perturbation Method. I. Nonpolar Gases. 58. J. aqvist, C. Medina, and J.-E. Samuelsson, Protein Elrg., 7, 385 (1994). A New Method for Predicting Binding Affinity in Computer-Aided Drug Design. 59. A. Warshel and S. T. Russell, Q. Rev. Biophys., 17,283 (1984). Calculations of Electrostatic Interactions in Biological Systems and in Solutions. 60. B. Roux, H.-a. Yu, and M. Karplus,]. Phys. Chem., 94,4683 (1990). Molecular Basis for the Born Model of Ion Solvation. 61. H. A. Carlson and W. L. Jorgensen,]. Phys. Chem., 99, 10667 (1995).An Extended Linear Response Method for Determining Free Energies of Hydration. 62. M. D. Paulsen and R. L. Ornstein, Protein Eng., 9, 567 (1996). Binding Free Energy Calculations for P450cam-Substrate Complexes. 63. A. Amadei, A. B. M. Linssen, and H. J. C. Berendsen, Proteins: Struct., Funct., Genet., 17, 412 (1993). Essential Dynamics of Proteins. 64. D. M. F. van Aalten and A. Amadei, Proteins: Struct., Funct., Genet., 22,45 (1995).The Essential Dynamics of Thermolysin: Confirmation of the Hinge-Bending Motion and Comparison of Simulations in Vacuum and Water. 65. R. M. Scheek, N. A. J. Van Nuland, B. L. De Groot, A. B. M. Linssen, and A. Amadei, J . Biomol. N M R , 6, 106 (1995). Structure from NMR and Molecular Dynamics: Distance Restraining Inhibits Motion in Essential Subspace. 66. D. van der Spoel, B. L. de Groot, S. Hayward, H. J. C. Berendsen, and H. J. Vogel, Protein Sci., 5,2044 (1996).Bending of the Calmodulin Central Helix: A Theoretical Study. 67. A. Garcia, Phys. Rev. Lett., 68,2696 (1992). Large-Amplitude Nonlinear Motions in Proteins. 68. T. lchiye and M. Karplus, Proteins: Struct., Funct., Genet., 11, 205 (1991). Collective Motions in Proteins: A Covariance Analysis of Atomic Fluctuations in Molecular Dynamics and Normal Mode Simulations.
326 SensitivityAnalysis in Biomolecular Simulation 69. A. Kitao, F. Hirata, and N. Gd, Chem. Phys., 158,447 (1991).The Effects of Solvent on the Conformation and the Collective Motions of Protein: Normal Mode Analysis and Molecular Dynamics Simulations of Melittin in Water and in Vacuum. 70. S. Hayward, A. Kitao, F. Hirata, and N. G6,/. Mol. Biol., 234, 1207 (1993).Effect of Solvent on Collective Motions in Globular Proteins. 71. N. Kobayashi, T. Yamato, and N. Go, Proteins: Struct., Funct., Genet., 28, 109 (1997). Mechanical Property of a TIM-Barrel Protein. 72. N. G6, T. Noguti,and T. Nishikawa, Proc. Natl. Acad. Sci. U.S.A., 80,3696 (1983).Dynamics of a Small Globular Protein in Terms of Low-Frequency Vibrational Modes. 73. B. R. Brooks and M. Karplus, Proc. Natl. Acad. Sci. U.S.A., 80,6571 (1983). Harmonic Dynamics of Proteins: Normal Modes and Fluctuations in Bovine Pancreatic Trypsin Inhibitor. 74. M. Levitt, C. Sander, and P. S. Stern, J. Mol. B i d , 181, 423 (1985). Protein Normal-Mode Dynamics: Trypsin Inhibitor, Crambin, Ribonuclease and Lysozyme. 75. T. Simonson and D. Perahia, Biophys./., 61,410 (1992).Normal Modes of Symmetric Protein Assemblies: Application to the Tobacco Mosaic Virus Protein Disk. 76. R. C. Wade, Trans. Biochem. Soc., 24, 254 (1996). Brownian Dynamics Simulations of Enzyme-Substrate Encounter. 77. S. H. Northrup, S. A. Allison, and J. A. McCammon,]. Chem. Phys., 80,1517 (1984).Brownian Dynamics Simulation of Diffusion-Influenced Biomolecular Reactions. 78. S. H. Northrup, J. 0. Boles, and J. C. L. Reynolds, Science, 241, 67 (1988). Brownian Dynamics of Cytochrome c and Cytochrome c Peroxidase Association. 79. R. C. Wade, M. E. Davis, and B. A. Luty, Biophys./., 64, 9 (1993). Gating of the Active Site of Triose Phosphate Isomerase: Brownian Dynamics Simulations of Flexible Peptide Loops in the Enzyme. 80. W. L. Jorgensen and J. Tirado-Rives, /. Am. Chem. Soc., 110, 1657 (1988). The OPLS Potential Functions for Proteins. Energy Minimizations for Crystals of Cyclic Peptides and Crambin. 81. C. I. Bayly, P. Cieplak, W. D. Cornell, and P. A. Kollman, 1. Phys. Chem., 97, 10269 (1993). A Well-Behaved Electrostatic Potential Based Method Using Charge Restraints for Deriving Atomic Charges: The RESP Model. 82. F. A. Momany,J. Phys. Chem., 82,592 (1978).Determination of Partial Atomic Charges from Ab Initio Molecular Electrostatic Potentials. Application to Formamide, Methanol, and Formic Acid. 83. S. R. Cox and D. E. Williams,]. Comput. Chem., 2,304 (1981).Representation of the Molecular Electrostatic Potential by a Net Atomic Charge Model. 84. U. C. Singh and P. A. Kollman, /. Comput. Chem., 5 , 129 (1984).An Approach to Computing Electrostatic Charges for Molecules. 85. C. F. Wong, J. Am. Chem. Soc., 113, 3208 (1991). Systematic Sensitivity Analyses in Free Energy Perturbation Calculations.
CHAPTER 7
Computer Simulation to Predict Possible Crystal Polymorphs Paul Verwer* and Frank J. J. Leusent *CAOS/CAMMCenter, University of Nijmegen, P.O. Box 9020, 6500 GL Nijmegen, The Netherlands, and tMolecular Simulations Ltd., 240/250 The Quorum, Barnwell Road, Cambridge, CB5 8RE, United Kingdom
INTRODUCTION Organic molecular solids are often obtained in crystalline form, either as single crystals or as a crystalline powder. The specific stacking of molecules in the crystal, the crystal packing, can influence important properties of the material, including density, color, taste, solubility, rate of dissolution, hygroscopic properties, melting point, chemical stability, conductivity, optical properties, and morphology. Crystallization of a given compound need not always lead to the same packing. Different crystal structures of the same compound (polymorphs) can often be observed, depending on crystallization c0nditions.l Polymorphism poses a problem if it leads to the unexpected formation of different crystal structures in commercial crystallization processes. This behavior is especially important in the pharmaceutical industry. Differences in macroscopic crystal shape (morphology) between polymorphs may, for example, lead to problems during filtration, and the shelf life of the final product may change as a result of changed chemical stability. At the same time, polymorphism may be Reviews in Computational Chemistry, Volume 12 Kenny B. Lipkowitz and Donald B. Boyd, Editors Wiley-VCH, John Wiley and Sons, Inc., New York, 0 1998
327
328 Computer Simulation to Predict Possible Crystal Polymorphs exploited by selecting the polymorph that has optimal properties or is not protected by patent. Knowledge of the three-dimensional atomic structure of a crystal will generally be the basis for an understanding of its characteristics. For crystals of sufficient quality and size (0.1 X 0.1 X 0.1 mm3 being a practical minimum), the structure can be determined accurately and reliably via single crystal X-ray diffraction. However, suitable crystals of the compound under investigation cannot always be grown. In some cases, only a powder, thin needles, or platelets can be obtained. In other cases, factors such as a large mosaic spread (i.e., the slight misalignment of small crystal blocks2a), twinning, or radiation-induced decay of the crystal may hamper structure determination via X-ray diffraction. Crystal structure prediction by computer simulation can be used to propose possible structures in those cases, or in cases of a compound that has not yet been synthesized. In cases of the latter kind, important properties (e.g., the density of new explosive materials or the color of new organic pigments) of still hypothetical structures may be predicted. Different routes of arriving at the crystal structure of a molecule are shown in Figure 1.
single crystal diffraction
powder diffraction
-
trial structure
lntroduction 329 Crystal packing simulations are by no means a recent concept. The field was pioneered by, among others, Kitaigorod~kii~ and Williams in the 1960s, and crystal structure determinations by molecular packing analysis were reported some 30 years ago. The determination in 1966 of the crystal structure of dibenzoylmethane by Williams4 provides an early example. In this study, the molecule was kept rigid, and the deviation from expected minimum nonbonded distances was used as a quality measure. Verification and refinement of the proposed packing was done by means of X-ray diffraction data. Later work5 used a more refined potential energy function. The computer program PCKS,6 which minimizes nonbonded interatomic close contacts, was used in the generation of a starting model in the X-ray structure determination of the free radical 2,4,6triphenylverdazyl.’ Here the most intense X-ray reflection was used to obtain an initial angular orientation of the planar molecule. After optimization of several trial structures with PCKS, followed by systematic variation of the three torsional degrees of freedom, a closely packed model structure was obtained, and then refined by means of the single crystal X-ray data. PCKS was also used in 1972 by Zugenmaier and Sarko in their analysis of six monosaccharides.8 These investigators were able to generate crystal packings close to those observed with X-ray diffraction by moving a rigid molecule and its symmetry-related copies in a unit cell of known dimensions, minimizing repulsion between nonbonded atoms. The aim of this work was to develop a method capable of predicting polysaccharide structures, for which single crystal X-ray diffraction data were (and still are) difficult to obtain. In an analysis in 1973 of the effect of packing on the conformation of triphenylphosphine, Brock and Ibers9 used the 1972 version of Busing’s program WMIN.I0 The calculations extended beyond the minimization of close contacts between rigid molecules, taking into account internal molecular strain, and van der Waals and Coulomb interactions. Another example of the application of WMIN is the generation of the initial packing in the crystal structure determination of 2-amino-4-methylpyridine,ll as subsequently refined using X-ray data. In this work, which was part of a systematic investigation of simple organic analogs to molecules of biological interest, computer simulation of crystal packing was essentially used to solve the phase problem in X-ray crystallography. In spite of these early successes, computational methods and algorithms that allow packing simulations with many degrees of freedom, including unit cell parameters and molecular flexibility, were not developed until this decade, making true “ab initio” crystal structure predictions possible on the basis of molecular information alone. In this chapter, we survey recently developed algorithms for predicting low energy crystal packings and discuss speed, accuracy, and other aspects of the computational methods involved. We mention the pitfalls encountered in the predictions of crystal structures, and illustrate how experimental information can help speed up the process of generating and identifying the correct structure from a large set of possibilities.
330 Computer Simulation to Predict Possible Crystal Polvmorbhs We limit our coverage to crystal structures of small organic molecules. Simulations of other materials (e.g., metal oxides, biopolymers), which often require their own, even more arduous approach to obtain accurate and reliable results, are not discussed here.
THEORY AND COMPUTATIONAL APPROACHES Crystals A crystal can be described as a three-dimensional stacking of identical building blocks, the unit cells. Predicting the structure of the crystal thus means predicting the size and shape of the unit cell and the positions of the atoms in it. The magnitudes of the spanning (or lattice) vectors (a, b, c) and the angles between them (a,p, y) define the unit cell (see Figure 2). In combination with the fractional coordinates of the atoms, the six cell parameters specify the crystal structure. In addition to the lattice symmetry, the atomic arrangement in a crystal often displays extra symmetry (e.g., mirror, rotational, inversion, translational) within the unit cell. If no translational symmetry is present within the
Figure 2 The vectors defining the unit cell, a, b, and c, and the angles between those vectors, a,p, and y.
Theory and Computational Approaches 331 unit cell, the cell is called “primitive.” In certain cases it is possible to define a larger unit cell, which has additional symmetry.2b This larger cell is said to be centered, and it displays extra translational symmetry, for example, along a translation vector t (a + b). The specific combination of symmetry elements present in a crystal structure defines its space group, and in three dimensions 230 different space groups can be constructed.2b If symmetry is present within the unit cell, only the coordinates of a unique part of the structure, the asymmetric unit, and the space group are necessary to define the positions of all atoms in the unit cell. For molecular crystals, the number of molecules per unit cell is labeled Z. The number of molecules per asymmetric unit is usually called Z‘. If the molecule is symmetrical itself, the number of molecules per asymmetric unit can be a fraction. Crystal structure prediction software will usually ignore this intramolecular symmetry. The choice of cell parameters is not unique because the same lattice can be described by different sets of parameters. Rules exist for obtaining a set of standard cell parameters for a given lattice, called the conventional cell. l2>l3 The “reduced cell” is the standard primitive (noncentered) cell to describe a given lattice. Andrews and Bernstein14 describe a method to determine the reduced cell from a given set of cell parameters, and in that report an overview of earlier methods is presented. More recently, a new algorithm for this purpose was developed by Zuo et a1.15
Thermodynamics The relative stability of polymorphs at a given temperature and pressure is determined by their differences in Gibbs energy, AG: AG = AU -k p AV - TAS
111
with energy U, pressure p, volume Y temperature T, and entropy S. Thus AG depends on differences in packing energy, crystal density, and entropy. The contribution of p AV is negligible at normal pressure because differences in density between polymorphs for small organic molecules rarely exceed a few percent.16 Even a difference in density of 10% would amount to an energy difference due to p AVof only about 1caYmo1 for an organic molecule of mass 300, as in, say, a typical steroid. This is three orders of magnitude smaller than the differences in U that can be found in energy calculations on pairs of steroid polymorphs, where energy differences of a few kilocalories per mole are common. For simulations of high pressure phases, the pressure term can become significant, however, as in the example of benzene at 25 kbar studied by Gibson and Scheraga.” Entropy differences need not be negligible at room temperature, but they are usually ignored because reliable calculation1* is not always straightforward. Effectively, then, the energy at 0 K is taken, leaving only U as the quantity to be calculated. Further assuming that structures with a calculated low energy U
332 Computer Simulation to Predict Possible Crystal Polymorphs
relative energy
(at 0 K)
Figure 3 True and modeled relative energies of polymorphs.
are good candidates to be observed in reality, most crystal structure prediction methods rank the predicted structures accordingly. Hence, the true free energy at a given temperature may be slightly different, not only because such approximations were made but also because of inaccuracies in the calculation of U . A hypothetical relationship between true and calculated energy is depicted in Figure 3 . In practice, the thermodynamic stability need not be the decisive factor in crystallization because kinetics plays an important role, influenced by such crystallization conditions as supersaturation and solvent environment. The macroscopic shape (morphology)of a crystal is also known to depend on the solvent environment; for examples and references see Weissbuch et al.I9 or Davey et aL2* This dependence is usually attributed to the influence of a solvent on the growth rates of different crystal surfaces. Although solvent has no effect on the relative thermodynamic stability of different polymorphs, it can be a key factor in specific crystallization of one particular polymorph by favoring its growth kinetically. Therefore, calculation of quantities, such as U or G for a set of predicted polymorphs is unable, in general, to provide a conclusive answer of what crystal (or crystalline powder) will be observed. Experimental data, like a powder diffraction pattern that is highly specific for a given crystal packing, often are essential for picking the true structure(s) in a set of possible low energy polymorphs from computer simulation. In conclusion, the crystal structure observed experimentally need not have the lowest possible Gibbs energy; depending on the crystallization conditions, different polymorphs may be grown. It can be assumed, though, that the observed structure is among those having a relatively low Gibbs free energy. The latter can be approximated by the energy U , which is considerably easier to calculate than G.
Theory aiid Comput~trorialApproaches 333
Computational Techniques A number of computational methods are frequently used in crystal structure prediction programs. To assess the differences between the programs, we list the most important techniques commonly used and mention their strengths and weaknesses. Potential Energy Functions The most common method used to calculate the energy of a structure relies on a force field as implemented in molecular mechanics (MM).Reviews on this technique can be found in Bowen and Allinger,21Dinur and Hagler,22 and Pettersson and L i l j e f ~ r s The . ~ ~ MM energy is calculated as a sum of readily identifiable parts, arising from bond stretching (EJ, angle bending ( Eb), torsional interactions (E,”,), van der Waals interactions ( Evdw),and electrostatic interactions (Eelec).The total energy is given by:
A separate energy term to account for hydrogen bonds is used in some force fields, and cross-terms (such as stretch-bend) are often used. Not all terms in Eq. [2] are always relevant to each crystal prediction method. If the molecules are considered to be rigid, Es, E,, and Etor are irrelevant because they depend on intramolecular features that do not change between different crystal structures during an energy minimization of the lattice. The obvious advantage of imposing rigidity is that energy calculations and minimizations can be done more quickly, because the number of degrees of freedom is much reduced. Treating molecules as rigid bodies can be a valid approach if the molecules are known to be rigid or have negligible flexibility, as in many of the fused aromatic ring systems used by Chaka et al.,24 or if crystal structures of similar molecules tend to have the same conformation, as can be observed in paraffins. Here, the Cambridge Structural Database25 (CSD) is a valuable source of information. Further simplifications to the MM energy function can be introduced such as by ignoring the electrostatic contribution for nonpolar compounds, and/or using mainly the repulsive part of the van der Waals function by ignoring the van der Waals energy for atom pairs with an interatomic distance greater than a certain threshold.26 Obviously, these simplifications will generally improve the speed of calculations, but they do so at some cost in accuracy. Quantum Mechanics Quantum mechanics (QM)provides a different means of calculating the internal energy of crystal structures. Software capable of calculating the energy of periodic systems using Hartree-Fock or density functional theory exists (e.g., CASTEP,27 C r y ~ t a l 9 5 FH196MD29). ,~~ Unfortunately these programs do not always provide the functionality to carry out energy minimizations. Moreover,
334 Computer Simulation to Predict Possible Crystal Polymorphs these methods are orders of magnitude slower than MM, and, because they account poorly for electron dispersion (correlation), they are not necessarily more accurate in their application to molecular crystals. Currently, the main application of Q M calculations on periodic systems seems to be focused on smaller, inorganic compounds like metal oxides, and not (yet) on the crystal structures of organic molecules that are handled easily and reliably by MM.
Scoring Functions A scoring function is a simple mathematical function (compared to a complete set of M M energy potentials) that estimates energies. It is often used to provide rough energies for a large number of similar molecular systems (e.g., when one is calculating the interaction energy between one molecule and a range of other molecules, each in a number of different orientations, as is done when docking a series of related ligands in a receptor). This approach was used by Hofmann and Lengauer30 to approximate the energies of predicted polymorphs, relying on a scoring function that is derived statistically from a set of observed crystal structures taken from the CSD. The distribution of observed interatomic distances is used to calculate a pair potential function for a given pair of atom types.
Charge Models The electrostatic contribution to the M M energy is usually calculated as the pairwise interaction between point charges placed on each atom. The resulting energy is easily calculated via Coulomb’s law: Eelec =
4i4;
D Y;,
[31
with q iand qi the charges on the atoms i and j, and rii the distance between those atoms; D is the effective dielectric constant. Numerous ways exist to assign atomic charges, some of which are computationally inexpensive (e.g., the methods developed by Gasteiger and Marsili,31 and by RappC and G ~ d d a r d ~Other ~ ) . methods, such as Mulliken charges33 or charges fitted to the molecular electrostatic potential ( use the results of Q M calculations. Methods to assign atomic charges have been reviewed by Williams36 and Ba~hrach.~’Unlike, for example, bond angles, atomic point charges do not represent a physically defined quantity: they are merely a representation that accounts for the effects of a particular electronic distribution throughout the molecule. Because these electronic effects can (in part) be taken into account implicitly in a force field, the choice of the best set of atomic charges generally depends on the force field selected; force fields are usually parameterized using a particular atomic charge scheme. Electrostatic interactions depend on the electric field around the molecule, which in turn is determined by the molecule’s electron density distribution. It is
Theory and Computational Approaches 335 thus sensible to use atomic charges that optimally reproduce the electrostatic potential in the vicinity of the molecule, so called ESP-derived charge^.^^,^^ From the electron density distributions calculated by Q M packages like MOPAC,38 Gaussian,39 GAMESS-US,40and GAMESS-UK?l the corresponding ESP atomic charges can be fitted, either within the program itself or via a separate program. The program MOLDEN,42*43for example, can be used to generate ESP charges from Gaussian and GAMESS (USRJK)output; the program PDM9344 also calculates ESP charges from Q M wavefunctions. The general procedure involves choosing a set of points around the molecule, calculating the electrostatic potential at each point, and then fitting atomic charges that best reproduce those calculated potentials. The position of the points where the potential is calculated, as well as the number of points, can be different for difCHELPG4’). However, ferent methods ( CHELP,45 Be~ler-Merz-Kollman,~~ ESP charges may be less well determined (mathematically) for atoms that are shielded by surrounding atoms-for example, the carbon atom in a methyl group, or more generally, any atom that is not at the surface of a molecule. The resulting artifacts may be avoided by applying appropriate restraints when these atomic charges are derived.48 A difficulty also arises if a molecule has conformational flexibility: the ESP charges are likely to be dependent on the molecular conformation, and the molecule may adopt conformations in the proposed crystal structures that are different from that used in the ESP charge calculation. One solution to this problem was proposed by Reynolds, Essex, and who fitted atomic charges to electrostatic potentials for several conformations, weighted with the appropriate Boltzmann factor. They applied their method only to alcohols and threonine. In some cases a computationally less expensive alternative is the method of Bayly, Cieplak, Cornell, and Kollman,48 which can force identical charges on atoms that are equivalent through rotational freedom, such as the hydrogens in methyl groups. Atomic multipoles can be used as an alternative for atomic point charge^,^^^^^ and they are expected to give a better representation of the nonspherical features of the electron density distribution, as found in lone pairs and certain .rr-electron densities. For example,51 optimized crystal structures of acetic acid using atomic multipoles were indeed closer to the experimental structure than those based on atomic point charges. The drawback of this method is that the calculation of the electrostatic term is more CPU-intensive than a point charge model.
Calculations Under Periodic Conditions The M M energy of a crystal is usually calculated for the asymmetric unit of a unit cell that is supposed to be part of an infinite lattice; thus, the cell is surrounded by an infinite number of identical cells. This convention has no technical implications for the calculation of the terms in the M M energy function that are restricted to atoms within the same molecule (bond-stretching, an-
336 Computer Simulation to Predict Possible Crystal Polymorphs
gle-bending, and torsional interactions, the so-called bonded interactions); these terms are made up of a limited number of contributions. O n the other hand, a difficulty arises in the calculation of nonbonded interactions, which occur between all atoms in the complete crystal, with the result that the number of interactions becomes unmanageably large. One way to deal with the fact that each atom in a periodic system has an infinite number of van der Waals and electrostatic interactions is to use a cutoff radius: interactions between atoms separated by an interatomic distance larger than a predefined value (the cutoff radius) are neglected. This will lead to a systematic error in the van der Waals energy because the neglected part will always be negative (the repulsive van der Waals interactions occur only when the interatomic distance is small, and these interactions are always included). Fortunately, since the magnitude of van der Waals interactions decreases rapidly with increasing distance (following r - 6 ) , the resulting error can be made acceptably small. Electrostatic interactions, however, are more problematic, because their magnitude decreases only slowly, as lly, leading to a slow convergence of the resulting energy for electrically neutral systems (see, e.g., Leusen , ~ ~ Table 3 ) . The net elecet a1.,s2 their Figure 4, or Gibson and S ~ h e r a g atheir trostatic interaction of a point charge with all point charges within the cutoff distance will often suffer from large fluctuations as the cutoff radius is changed. This implies that even for relatively large cutoff radii, the calculated Coulombic energy can have a significant error. A major improvement can be achieved by grouping the atoms (in the case of atomic point charges) into so-called charge groups, small clusters that have no net charge, and including interactions with all atoms within a charge group if one of its members (or its center of mass) is inside the cutoff radius. A more rigorous approach is the Ewald s u m m a t i ~ n This . ~ ~method, ~~~ first presented by P. P. Ewald in 1921, exploits the periodicity of the system by calculating part of the summations in reciprocal space. When this method is used, the energy converges much faster, and a more accurate result can be obtained. In their study of crystal packing, Gibson and Scheragas3 used a cutoff radius in the calculation of van der Waals and hydrogen bond terms, and Ewald summation for the Coulombic contributions to the energy. Gibson and Scheraga concluded that there is little influence on the final lattice parameters and the number of iterations in the minimization when the cutoff radius is varied. The absolute energy of the minimized structures did change significantly, however. Recently, van Eijck and Kr00n"~discussed the implications of dependence of the electrostatic energy of a crystal on its macroscopic shape if the crystal has a nonzero dipole moment. Ewald summation then results in the lowest possible energy. This minimum energy corresponds to the situation that the crystal finds an energetically optimal shape (a needle, with the dipole moment directed along the needle axis, or a platelet with the dipole moment in its plane) and/or
Theory and Computational Approaches 337 that the charges on the crystal surface are counterbalanced by external charges. The latter situation is sometimes referred to as “tinfoil boundary conditions” (because a conductive surface allows for the necessary redistribution of charge); crystallization in water or any other medium with a high dielectric constant may approach this situation. In their paper, van Eijck and Kroon concluded that the minimum achievable energy is best used in crystal structure predictions, thus assuming optimal crystal shape or tinfoil boundary conditions. Ewald summation will directly yield this energy; if a cutoff radius is used, a (simple) correction term must be added to the result.
Minimizers Energy minimization is often an important part of the crystal structure prediction process. For example, many crude trial structures can be generated rapidly, but all must be optimized to obtain low energy crystal packings. The time spent on minimizing a trial structure is usually orders of magnitude longer than the time needed to generate it. Consequently, minimizations are often the most time-consuming step in the complete structure prediction process, and a fast minimization algorithm is important. Several minimization methods are well known, and ready-to-use program codes are a~ailable.~’ The large number of variables and the complexity of the energy function are among the factors making energy minimization a time-consuming process. Considerable speed-up may be achieved by introducing simplifications when the structure is still far from its minimum energy. For example, the number of variables may be reduced by treating the molecules as rigid bodies, and the energy function may be simplified by omitting certain contributions (e.g., electrostatic terms).58 Switching between different minimization algorithms may also improve ~peed.~~.~~ The efficiency of widely used programs for rigid body minimization of crystal structures was criticized by Gibson and S ~ h e r a g aThey . ~ ~ introduced a new algorithm, based on secant methods (computationally fast methods to compute derivative matrices6*) that efficiently calculate the energy gradient with respect to the minimization variables. The energy surface described by a force field is usually complicated, containing many local minima in addition to the minima corresponding to the true crystal structures. Simple minimization algorithms will generally produce a minimum energy structure that is near to its starting structure. This implies that many starting structures must be tried to find all relevant minima. A way to avoid this problem is to agitate the structure at certain points during the simulation, allowing energy barriers between different minima to be overcome and making it possible to reach low energy minima from local minima with a higher energy. A similar idea is used in the method of simulated annealing, where the temperature of the simulated system, which determines the ease at which energy barriers can be overcome, is slowly lowered during the simulation. The OREMWA method62 uses simulated annealing, along with a fast minimizing
338 Computer Simulation to Predict Possible Crystal Polymorphs algorithm, and is sometimes, but not always, capable of reaching a global minimum starting from local minima.63 However, because observed crystal structures do not necessarily correspond to global energy minima (hence the occurrence of polymorphism), and because force fields are only approximations of the true energy function, a method that searches only for the global energy minimum seems inadequate. One is better served by searching for a set of low energy crystal packings, and simulated annealing can assist in this endeavor, especially when low energy structures encountered during the annealing process are stored for further analysis.
Clustering of Similar Structures At different stages of a crystal structure prediction, it may be necessary to reduce the number of structures under consideration. One way to do this is to cluster the generated structures and select one structure from each cluster. One type of clustering is the grouping of almost identical structures contained within the same local energy minimum after minimization. A comparison based on cell parameters is problematic, however, because different sets of cell parameters can be used to describe the same lattice, a problem that can (in part) be avoided by comparing the reduced cell parameters, which constitute a unique representation of the cell. Unfortunately, small deviations in coordinates can lead to very different angles for the reduced cell. Therefore, comparison based purely on reduced cell parameters cannot reliably identify similar structures. Fortunately, an alternative set of unique parameters, avoiding the discontinuities of the reduced cell angles, was suggested by Andrews, Bernstein, and Pelletier.64 Karfunkel et al.65 suggested a method to quantify similarity between crystal structures. It is based on the correspondence between X-ray powder diagrams, simulated for the predicted structures. The advantage of this method is that powder diagrams are independent of the mathematical description of the crystal lattice. Powder diagrams give the intensity of the reflected X-ray beam as a function of the reflection angle. Because the peaks are spikelike, and their position and height are very sensitive to small deviations in the structure, the overlap between the corresponding peaks of similar structures is rapidly lost. In this method, the intensity at each point in one powder diagram is compared with the intensity in the environment of the corresponding point in the other diagram (and vice versa). Thus, the authors avoid the problem of rapid loss of the overlap itself (i.e., the strict point-to-point correspondence) if structures are not identical. The algorithm, written in FORTRAN-77, has been p~blished.~~ The commercially available Polymorph Predictor66 program clusters structures by comparing lists of interatomic distances, an approach that is described in more detail in a later section. A second type of clustering concerns the grouping of thousands or even millions of trial structures into a limited number of clusters (tens or hundreds), containing structures that have common features but are not identical. From
Theory and Cornputatzonal Approaches 339 each cluster a single structure is then minimized. Here, the aim is to represent as much structural diversity as possible in the small set of structures to be minimized. In principle, this type of clustering can be considered perfect if all structures in one cluster end up as the same crystal structure after minimization, and if all clusters represent different energy minima. From a practical point of view, clustering at this stage can be considered successful if it produces a small subset of all trial structures from which many different structures are obtained after minimization, thus reducing the total CPU time needed. Clustering methods that are sensitive to small structural changes cannot be expected to perform too well because the differences between the trial structures are usually quite large. An algorithm suitable for clustering crude trial structures as well as optimized structures was developed by van Eijck and Kro01-1.~’Simplifying a procedure proposed earlier by Dzyabchenko68 for measuring similarity, the authors base their approach on a comparison of cell parameters and the positions and orientations of structural fragments in different structures, taking into account the transformations allowed by space group symmetry. Currently, their method has been worked out only for the case of one rigid molecule present in the asymmetric unit. Because the method has to account for operations allowed by space group symmetry, it requires specific code for each space group.
Crystal Structure Prediction Methods A variety of computational methods for the prediction of crystal packing have emerged during the last decade. At least three approaches to constructing low energy crystal packings can be discerned: 1. Construction of low energy clusters of 10-50 molecules, which can be viewed as the nucleus from which the crystal will eventually grow. The center of such a cluster is assumed to be similar to the final crystal structure. Thus, the crystal structure is to be found by simulating the start of the crystallization process. 2. Construction of configurations containing 1-10 molecules, related by the desired symmetry elements, which are then subjected to lattice symmetry to form crystals. As in method 3 , nonperiodic clusters are generated first, but here, instead of having a relatively large cluster size, translational symmetry is introduced to simulate a bulk environment.
3. Generation of a large set of crude molecular packings, subject to the desired space group symmetry, which are then energy optimized. Periodicity is assumed at all stages, and there is no initial consideration of aggregates of a small number of molecules. We mention these approaches mainly to highlight the most characteristic features of different prediction methods. The efficiency, reliability, and general applicability of each method will vary with the particular implementation.
340 Computer Simulation to Predict Possible Crystal Polymorphs An example of the first approach is provided by Williams,69 who modeled crystallization nuclei by minimizing the energy of clusters of 2-15 benzene molecules. It was argued, however, that this cluster size is too small to have a significant relation to the crystal structure.70 Calculations on clusters of up to 42 benzene molecules71 reproduced the structure of the benzene crystal at the center of the cluster. Simulations of this type are in principle more straightforward than those imposing lattice symmetry on the final structure. To obtain a reasonable result using such an approach, however, a large number of molecules in the cluster must be used to reduce artificial surface effects. It has been claimed that some of the simulated clusters of benzene molecules correspond to clusters that are proposed in the interpretation of experimental data.62 Several programs exist for the generation of small molecular clusters, which can then be put onto a three-dimensional grid. These include Gavezzotti’s PROMET372 and the recently developed Fle~Cryst.~* The method of Perlstein73-77 also works more or less along these lines. The general idea behind these methods is that strong interactions, such as hydrogen bonds between a small number of molecules, play a decisive role in the formation of the complete crystal structure. Therefore in many cases the crystal structure can be based on a suitable low energy configuration of just a small number of molecules. This procedure may provide an efficient means of generating correct crystal structures in some instances, but it will be unreliable in cases where very stable clusters can be formed that are not observed experimentally. Examples of this include the dimers of acetic a ~ i dand ~ a~l l ~, x~a n~. ’ ~ PROMET3 generates clusters of molecules that are related by common symmetry operators (inversion center, screw, glide, and translation).80More than 80% of all structures in the CSD are in space groups formed by these symmetry elements. The clusters are then subjected to translational symmetry, thus creating a trial crystal structure. The molecules are kept rigid in the whole procedure. To obtain minimum energy structures, the trial structures are optimized in a subsequent step, by means of a separate energy minimization program. The method was recently applied to generate crystal packings for a coumarin derivati~e.~~ FlexCryst contains a number of uncommon features.30 Unlike most other programs, it computes strong intermolecular interactions between functional groups, such as hydrogen bond centers and phenyl, methyl, and amide groups. Those are used to form clusters of molecules that fill the unit cell, to generate interactions between translated clusters of molecules and to calculate the corresponding translation vectors. Suitable triples of these vectors, which constitute a three-dimensional cell with sufficiently strong interactions along the translation vectors, can then be used as possible lattice vectors. The energy of the resulting crystal structures is evaluated by means of a scoring function, which in turn is derived from the distribution of interatomic distances in known crystal structures in the CSD. The use of a simple function to evaluate energies, together with the absence of repeated optimization, makes the method very fast,
Theory and Computational Approaches 341 albeit at the expense of accuracy. If one considers the length of the difference vector between true and predicted lattice vectors as a measure of accuracy, FlexCryst generates errors of 0.7 A compared to 0.1 A for methods using minimization. These results are somewhat optimistically biased, however, because the molecular conformations found in the experimental crystal structures were used. The method developed by Perl~tein~"'~ is based on the construction of stable one-dimensional aggregates that become the building blocks for two-dimensional aggregates, from which eventually three-dimensional structures may be constructed. The method, which builds on work by Scaringe and Perez,*l is currently implemented for the one- and two-dimensional stages. The one-dimensional aggregates are linear chains of molecules that are related by a symmetry operation, the most common operations being translation, glide, screw, and inversion. In the translation aggregate, all molecules are identical in geometry and orientation, and are positioned along a line according to a given repeat distance. In the glide aggregate, subsequent molecules are mirror images, because they are related by a combination of a mirror and a translation operation. In the screw aggregate, the operation that relates subsequent molecules is the combination of a rotation and a translation. The inversion aggregate is composed of molecules that are related by inversion points. Because of the large number of possibilities, a systematic search over all possible aggregates is impossible. To search for low energy aggregates more efficiently, Perlstein used a Monte Carlo (MC) procedure, using the orientational angles of the molecule and the repeat distances as variables. Efficiency was further improved by varying the M C temperature (4000-300 K ) and the maximum allowed change in molecular orientation during the simulation. Energies were calculated using electrostatic interactions based on Gasteiger3' atomic charges and van der Waals interactions as parameterized in the MM282 force field. Depending on the symmetry element present, experimentally observed aggregates were usually found among the best 10-20 predicted. A variety of methods exists for the generation of crystal structures by applying lattice symmetry at all stages (as opposed to generating nonperiodic aggregates first). Among them are MPA59y*3and MDCP.84 Other methods, which apply full space group symmetry (including symmetry within the unit cell) to a given number of independent molecules in the cell, include MOLPAK,26 UPACK,85 ICE9,24 the method of Schmidt and Englert,86 and the Polymorph PredictoF of Molecular Simulations Inc. (MSI). Each of these programs is described below. In MOLPAK26 (molecular packing), lattice symmetry is introduced first in one dimension by generating a close packing of molecules along a line, and then extended to two and three dimensions in subsequent steps. During packing, space group constraints are accounted for by adding additional molecules, related via inversion, mirror plane, glide, twofold axis, or twofold screw axis symmetry as needed. The initial orientation of the central molecule is varied
342 Computer Simulation to Predict Possible Crystal Polymorphs
systematically (the program is limited to a single molecule in the asymmetric unit). Trial structures have to be refined to obtain possible crystal packings; the authors of MOLPAK used the WMIN programs7 for rigid body refinement. The crystal structure prediction program MPA59.83 (Molecular Packing Analysis) works by using a Monte Carlo procedure to randomly orient a given number of molecules in a trial unit cell. To accomplish this stochastic approach to sampling trial unit cells, the energy of the cell is minimized with a rigid body optimizer. Although this procedure imposes no space group symmetry, symmetry may be present in the minimized structures as permitted by the number of molecules in the cell. UPACKg5 (Utrecht crystal packer) was originally developed for the specific problems of predicting crystal structures of monosaccharides,88 flexible molecules that form hydrogen-bonded structures. It generates trial structures systematically, which are then subjected to a rough rigid body minimization. Hydrogen atoms of hydroxyl groups are not included in the model at this stage; rather, the hydroxyl groups are treated as united atoms. After this first quick minimization, equivalent structures are removed by means of a dedicated clustering a l g ~ r i t h m . ~Another ’ rigid body minimization is performed after hydroxyl hydrogens have been added, followed by a second clustering step. A final energy minimization using a very strict convergence criterion, followed by another clustering, produces a list of predicted structures. The program is currently limited to triclinic, monoclinic, and orthorhombic space groups, with a single molecule in the asymmetric unit. ICE924 also starts by systematically generating trial crystal packings. In their studies of mostly aromatic hydrocarbons, the authors of this program used quantum mechanically optimized (3-21G basis sets9) geometries that were kept rigid throughout the calculation. Following energy minimization, which is based only on interactions between molecules in a central cell with their direct neighbors, the energy is recalculated using a cutoff radius of 10 A and a molecular multipole expansion for the electrostatic interactions. Predicted structures are sorted by energy for each space group, and the top-ranking structures from each space group are combined into the final set of predicted polymorphs. The MDCPs4 (Molecular Dynamics for Crystal Packing) program works by performing a molecular d y n a m i c ~ ~ 0 (MD) - 9 ~ run at constant temperature and pressure on a periodic system. The unit cell consists of 4 or 8 rigid molecules (allowing for crystal structures with Z = 1,2,4, or 8 ) and is initially very loosely packed to allow the molecules to change their orientation. During the MD run, low energy structures are stored for minimization at a later stage, thus producing proposed crystal structures, which are finally checked for space group symmetry. Energy calculation is done using the Ewald method for electrostatic interactions and a cutoff radius (typically 14 A) for the van der Waals terms. The method has been tested with mixed success on the structures of CO,, benzene, pyrimidine, and 1,2-dimetho~yethane.~~ Schmidt and Englert86 developed a method called CRYSCA (Crystal Structure Calculation) based on rigid body lattice energy minimization of ran-
Theory and Computational Approaches 343
dom crystal packings. Their method uses cutoff radii of up to 20 A or a limited summation including five unit cells in each direction for the nonbonded interactions. The method can handle all space groups and allows molecules to occupy special positions. The method was successfully tested on 25 organic and organometallic compounds, using atomic charges from extended Huckel calc u l a t i o n ~and ~ ~van der Waals parameters obtained by carefully combining different published parameter sets. The MSI Polymorph Predictor66 (PP) is based on a four-step m e t h ~ d . ~ ~ - ~ ~ Sampling via Monte Carlo simulated annealing provides a starting set of trial structures. These are clustered to delete similar structures, minimized to create low energy crystal packings, and once more clustered to remove duplicates. Since the method is under continuous development and the current implementation differs significantly in some places from the procedure originally published, we present it below in relative detail. During trial structure generation, angular degrees of freedom (the cell angles, the Eulerian angles describing the orientation of the independent molecules in the cell, and the Eulerian angles of the vectors between them) are varied in a Monte Carlo procedure. In each Monte Carlo step, new angular parameters are chosen based on a “move factor,” a number between 0 and 1 that sets the maximum possible change in parameters. A move factor of 1 means that all parts of phase space (all possible combinations of angular parameters) are accessible within one move. Once new angular parameters have been chosen, the translational parameters (cell lengths and distances between independent atoms) are adjusted to relieve close interatomic contacts. The new trial structure is then accepted or rejected according to the Metropolis algorithm.97 That is, its energy Enewis compared to the energy Eoldof the last accepted structure, and it is accepted if exp(Eold - Enew)/kTis larger than a random number between 0 and 1. This implies that a structure with lower energy than the last accepted one will always be accepted, and that a structure with higher energy has a probability of being accepted depending on the energy increase and the product kT, where k is the Boltzmann factor and T the “temperature” of the simulation. During the packing procedure, the molecules are treated as rigid bodies and the temperature is slowly decreased from several thousand kelvins to 300 K. Thus energy barriers are easily overcome in the beginning of the simulation, and there is a gradual steering toward low energy structures as T drops. The move factor described earlier is used to aid the search. It is doubled every time a trial structure is accepted to encourage the search to visit another area of phase space. Every time a structure is rejected, the move factor is halved to get a more detailed sampling of that region of phase space. At all times, the move factor stays in the range of 0 to 1. Typically, some 2000 trial structures are generated per space group and are clustered in the second step of the prediction. Clustering is based on interatomic distances, which are grouped according to force field atom type, the element, or the name of the atoms. Thus if four different atom types (say, a, byc, and d) are present in the structure, 10 combinations of two atom types are pos-
344 Computer Simulation to Predict Possible Crystal Polymorphs
sible (a-a, a-b, a-c, a-d, b-b, b-c, b-d, c-c, c-d, and d-d), which means that 10 types of interatomic distance are present. For each structure, a list of interatomic distances (within a certain cutoff radius) is made for all combinations of atom types. Clustering is then based on the similarity between the lists generated for different structures. A cluster is formed by taking the lowest energy trial structure, and adding to this cluster all structures having sufficiently similar distance lists. This process is repeated until all trial structures are clustered, or a preset maximum number of clusters has been created. Generally some 250 clusters are formed. The elegance of this clustering algorithm is its speed; a drawback is sometimes poor discrimination. In the third step, the lowest energy structure from each cluster is subjected to a full-body minimization (i.e., including molecular flexibility) under space group symmetry constraints, using Ewald summation for the van der Waals as well as the electrostatic terms, and a fast second-derivative minimizer. Finally, in the fourth step the minimized structures are clustered once again, to remove duplicate structures. This automated four-step procedure produces possible polymorphs for a given combination of space group, number of molecules in the asymmetric unit, and molecular starting conformations. It has been applied to a number of crystal structures of organic molecule^.^^-^^
Related Software It is worth mentioning other programs that can carry out some, but not all of the computational tasks in crystal structure prediction. For example, an efficient rigid body, second-derivative crystal packer, PCK83,100 could be combined with a crude packing generator to produce optimized crystal structures. Such an approach was used by G a v e z z ~ t t iSimilarly, .~~ the DMARELS0 crystal structure relaxation program, which implements a set of distributed multipoles to model electrostatic interactions, could be utilized. C r y ~ t a l 9 5FH196MD,29 ,~~ and CASTEP27 are among the programs that can do ab initio quantum mechanical calculations on crystalline materials. Unfortunately, neither Crystal95 nor FHI96MD has the capability to optimize crystal structures. Computed crystal structures are not always in a standard cell setting (like the reduced ce1112):the packing process may produce a unit cell with very acute angles, and the assignment of axis labels may be nonstandard. Although this makes no difference to the actual crystal structure, inasmuch as it is merely a matter of choosing between equivalent mathematical descriptions of the structure, a standard description of the lattice is often needed when one is comparing structures, converting data to other formats, and so on. Among the computer programs that can generate reduced cell parameters are PLATONTO' and NIST"LATTICE.102 Space group symmetry can be detected in periodic structures by the symmetry-finding module in Cerius2 by M S F and by a (freely available) modified version of the library program ACMM.Io3 Methods have been published to derive both the conventional cellt3 and the reduced cell.I5
Theory and Computational Approaches 345 Le Page, Klug, and Tse104 describe a method to derive lattice parameters for atomic clusters, generated from simulations without periodic constraints. Aimed at simulations of inorganic materials, it relies on eye identification of atom pairs that are related by lattice translation. These are used to derive a primitive cell. Possible symmetry elements are then detected by the program MISSYM.10”,’06
Comparison of Different Techniques Although in their operation crystal structure prediction programs differ widely, many follow a similar basic approach to the problem. First some form of sampling is carried out, for the generation of a set of trial structures or clusters. Then, from this set, possible crystal structures are generated via packing and energy minimization processes. In these processes, a number of common factors can be identified that influence speed, accuracy, and completeness of the predictions. A number of these factors are discussed below, and Table 1 gives a summary, to the best of our knowledge, based on the sometimes limited data in the relevant papers. At the start of a calculation, one or more molecular conformations must be selected. If the simulation is carried out to test a novel prediction method, usually a known crystal structure from the CSD is used. Although it is tempting to use the known, solid state molecular conformation, a more rigorous test
Table 1. Comparison of Characteristic Features of Crystal Structure Prediction Methods
Features (see table notes) Program ICE9 FlexCry st MDCP MOLPAK MPA MSI PP PROMET3 CRYSCA UPACK
a
b
S S MD S R MC S R S
Y N Y Y Y Y Y Y Y
c
d
N FF N S N FF N FF N F F Y FF N FF N F F Y F F
e
f
g
h
MMP N Y N+Y Y
N N Y N Y Y Y N N
N Y N N N Y Y N Y
Y Y N Y N Y Y Y Y
Y N+Y Y Y
I
Z’=1 Z’=1 2 5 4
Z’=1
2 5 4 Unrestricted 2 5 4 Unrestricted
Z’=1
“Search type: systernatic/random/MC/MD. hMinimization of structures: YM. ‘Full-body minimization: Y/N. ”Energy function: force field /scoring function. ‘Coulombic interactions included: Y/N/(N Y)/MMP. ( N + Y) = in final stages only; M M P = via molecular multipole moments. (Ewald summation: Y/N. %Clustering:YIN. hApplication of symmetry: Y/N. ‘Maximum Z or Z’ value: 1/2/. . . /unrestricted.
+
346 Computer Simulation to Predict Possible Crystal Polymorphs is to involve one or more minimum energy conformations generated by molecular mechanics or quantum mechanics. Variables like starting orientations of molecules and initial cell parameters, which are continuous (as opposed to space group and number of molecules in the cell), can be sampled systematically or randomly, by taking frames from a molecular dynamics simulation or a Monte Carlo procedure. To perform a systematic search on the variables, one must limit the possibilities to certain discrete values. For example, translational parameters can be limited to points on a grid, and rotational parameters can be varied in steps. The step size or density of the grid points will eventually determine the completeness of the deterministic search. Factors defining the expanse of the space to be searched include the number of independent molecules in the unit cell (adding to the number of degrees of freedom), the number of bond rotations, and the size and shape of the molecule. Eventually, with growing complexity of the system, a systematic search will become impractical and random search or Monte Carlo methods more effective. A molecular dynamics simulation is least favorable,*07>10sbecause much time is spent near a few minima, making a thorough sampling rather time-consuming. In many of the approaches described above, a large part of the CPU time is spent on minimization of trial structures. Important factors here are the minimization algorithm, the energy function, and the number of variables to be minimized. In principle, the minimization variables are all atomic coordinates and the cell parameters; their number can be reduced by imposing space group symmetry or by treating the molecules as rigid bodies. If rigid body constraints are imposed, the molecular conformation cannot change in response to packing forces. If space group symmetry is imposed, the coupling of the movements of symmetry-related atoms may lead to energy minima that become unstable when symmetry constraints are removed.35 Thus space group constraints may lead to false energy minima. Choosing a more tractable energy expression, by neglecting some of its critical terms at appropriate stages of the minimization, offers another way to reduce computational cost, a strategy followed in PROMET3, where electrostatic interactions are initially left out, and in MOLPAK, which omits the attractive part of the van der Waals interactions. To limit the number of atom-atom interactions to be considered in the calculation of van der Waals and electrostatic interactions, a suitable cutoff radius (UPACK) or an Ewald summation (MPA, Polymorph Predictor) may be used, or both (Gibson and Scheragas3). Finally, the use of lookup tables instead of repeated evaluation of interaction functions may speed up computati0ns.~~,~09
Using Experimental Data If the crystal structure of a particular observed polymorph is to be determined, experimental data of several types can be used at different stages of the
Predicting and Evaluating Crystal Structures 347 prediction. Powder diffraction data as well as other data (from, e.g., solid state NMR or IR spectroscopy) can be used in setting up the simulation. Spectroscopic data may provide information on the orientation of the molecules in the crystal and help in choosing the initial conformations of the molecules in the prediction. Solid state NMR data can be used to determine the number of molecules in the crystal that are not related by symmetry, and thereby the number of molecules in the asymmetric unit. Cell parameters can be obtained from a good powder diffraction pattern by searching for a set of cell axes and angles that would produce reflections at the observed diffraction angles, Such a procedure, called “indexing,” can be carried out mostly automatically with programs like TREOR90110 or DICVOL91 Once the cell parameters have been established, the cell volume provides an estimate for 2, the number of molecules in the cell. Based on Z and the observed cell parameters, the space groups that are most likely can be identified. For example, if all cell angles are 90°, all cell axes have different lengths and 2 is estimated to be 4, the space group is most probably P212,2,. If cell angles and cell lengths all have different values, and 2 equals 2, space group PI is most likely. In a later stage of the prediction, powder diffraction data can be compared with simulated powder patterns of proposed polymorphs for identification purp o s e ~A. ~model ~ structure that is close to the experimental one will produce a similar pattern and can then be further refined by means of the Rietveld method.l12
PREDICTING AND EVALUATING CRYSTAL STRUCTURES The first steps of a crystal structure prediction usually involve choosing the starting molecular conformation(s) and calculation of a set of charges. A complete conformational analysis must be carried out to obtain a good set of conformations that is within an acceptable energy range. Alternatively, the CSD can often provide crystal structures of similar compounds, from which feasible molecular conformations or geometries of ionic complexes can be extracted. Semiempirical or ab initio quantum mechanical methods are then applied to derive a charge distribution (as explained earlier) and to optimize the molecular geometry if necessary. Typically, ESP-derived charges, based on Hartree-Fock calculations using a 6-31G“ or 6-31G*“ basis set are used. Although MNDO charges, scaled by an appropriate factor, may provide a computationally less expensive a l t e r n a t i ~ e ? ~ it , ’ should ~~ be kept in mind that the charge calculation generally takes only a small part of the total computation time spent in a polymorph search. Since, however, these charges may have a large influence on the calculated energy, the time needed for a more elaborate charge calculation may be well spent. Another option is to use a force field such as CFF,l14 for
348 Computer Simulation to Predict Possible Crystal Polymorphs which atomic charges have been optimized together with the other parameters in the force field fitting procedure. Crystal structures are usually predicted in separate runs for given combinations of space group and number of independent molecules in the unit cell, Z ' . Although the explicit use of space group symmetry is, in principle, unnecessary if predictions with a suitable number of independent molecules in the cell are carried out, it will generally be more efficient to use space group symmetry and a small number of independent molecules instead, because this can reduce the number of variables drastically. Choosing space groups for structure prediction and estimating the number of independent molecules to use can be facilitated by using data from the CSD: statistical a n a l y ~ i s l ' ~of. ~the ~ ~CSD shows that five space groups (P2,/c, Pi, P2,2,2,,C2/c, P2,) account for approximately 78% of all crystal structures of organocarbon compounds in that database, and only 8.3% of all structures have more than one formula unit in the asymmetric unit. Optically pure chiral compounds obviously cannot crystallize in space groups that contain a mirror or an inversion operation. These compounds can crystallize only in a subset of 65 out of the 230 space groups. Their distribution over these 65 space groups is similar to the distribution of all compounds over this subset: 78 % of the chiral compounds crystallize in either P2,2,2, or P2,. Keep in mind that these numbers reflect the distribution of solved crystal structures for rather nonspecific sets of molecules. Structures that are less readily solved experimentally, such as those containing more than one (nonsolvent)molecule in the asymmetric unit (Z' > I),may be underrepresented in the database. Particular subsets of structures may also deviate significantly from trends generally observed: for example,l17 40% of the alcohol crystals have Z' > 1. Still, as a general rule, prediction of crystal structures with a single molecule in the asymmetric unit in P2,/c,PI, P2,212,, C2/c, and P2, for nonchiral compounds; in P2,/c, P1, and C2/c for racemates; and in P212,2, and P2, for optically pure chiral compounds, is, statistically, a logical place to start. The first result from a crystal structure prediction usually involves a large set of crystal structures for the complete range of space groups and Z values considered. An example is given in Figure 4,where energy and density of predicted crystal structures for acetic acid are plotted in a scatter diagram. Often hundreds of crystal packings are predicted. How relevant are all these minimum energy structures? One criterion for discarding proposed structures is the calculated energy: within 3 kcaVmol of the global minimum seems to be a reasonable acceptance range for crystals of typical organic molecules found in the CSD,16 although this number will depend on the size of the molecule. This cutoff means that in principle, one can discard all predicted structures in excess of 3 kcal/mol above the lowest energy structure predicted. This range depends on how well the force field performs for the molecule; adjustments to standard force fields may be necessary (and feasible) in some cases, as in the determination of the structure of 4-amidinoindanone guanylhydrazone by Karfunkel et al.,llR where equilibrium bond distances and angles were shifted to values obtained in ab initio calculations.
Predicting and Evaluating Crystal Structures 349 1.40
1.20
%. h
0)
1 .oo
0.80 -441.0
Figure 4
-42.0
-40.0
Energy (kcalhol)
-38.0
I -36.0
Energy and density of predicted crystal structures of acetic acid.
If a Monte Carlo search procedure is used, it often proves more efficient to perform several short simulations rather than one long simulation for each
space group. The overlap (or the lack thereof) in the identified low energy structures from a series of short runs is a good indication of the effectiveness of the search; if the second or third simulation in a series does not provide any new low energy structures, it can usually be assumed that all relevant minima have been found. If only one long simulation is performed, it is much more difficult to establish the same level of confidence. After the energy cutoff has been applied, a large number of predicted structures may yet remain. Their number can be further reduced for the following reasons: 1. The same packing structure may have been predicted in different space groups. One example is the prediction of a structure in PI, Z = 2, as well as in P2,, Z = 1 (which, on the other hand, can be taken as an indicator of the completeness of the search). Intramolecular symmetry elements may lead to the prediction of the same structure in different space groups, with the same number of independent molecules. For example, certain crystal packings of acetic acid, which is mirror-symmetric, can be predicted in both P2,lc, Z = 4,and P2,2,2,,
350 Computer Simulation to Predict Possible Crystal Polymorphs Z = 4 (both structures have a single independent molecule in the cell). A suitable clustering of the complete set of predicted structures should thus remove duplicate structures of this type. 2. Minimizations are often performed based on an assumed space group symmetry and a certain value of Z. Thus, constraints are imposed that may lead to structures that do not correspond to local energy minima with respect to the degrees of freedom that were constrained. Minimization using a superlattice (no symmetry imposed on the contents of the unit cell) or a supercell (a new cell made of two or more original cells) may lead to a lower energy minimum. Note that this artifact of imposing space group symmetry does not prevent correct crystal structures from being found; it merely leads to the generation of additional, unrealistic crystal structures. 3. Some of the energy minima may be separated from lower minima by a small barrier that would be easily overcome in reality. In practice, these metastable structures would therefore not correspond to stable polymorphs. A brief molecular dynamics simulation on all structures might overcome the small energy barriers.51 4. Even if a force field suitable for the particular class of molecules is used, its limitations may lead to artificial energy minima.58 Recalculation of the low energy structures with a different force field (that is also supposed to work well for the molecule in question) may then eliminate erratic structures.
Finally, if the structure of an experimentally observed polymorph is to be determined, powder diffraction data may help to identify the true crystal structure, as mentioned earlier. A flowchart describing the general procedure is given in Figure 5.
Example: Polymorph Prediction for Estrone As an illustration of the procedure given above, we will describe a prediction of possible polymorphs for the steroid estrone. The structures of three polymorphs of estrone are available in the CSD,119,120one structure in space group P2,, Z = 4, Z’ = 2 (two independent molecules in the asymmetric unit) and two structures in P2,2,2,, both with Z = 4,Z’ = 1. The structure of the estrone molecule is given in Figure 6. We will assume the crystal structures to be unknown and indicate how experimental data could help predict the correct polymorphs. First, we build models of the molecule we will use in our prediction. The molecule is sketched and optimized using a suitable force field. A search in the CSD for steroids with an identical ring skeleton will reveal that this type of skeleton is rather rigid because of the aromaticity of the first ring. The skeleton of our optimized molecule should fit reasonably well onto those of the experimental structures. Although the skeleton is rigid, there is some conformational
Predicting and Evaluating Crystal Structures 351 experimental powder pattern
structures
x
I
I
experimental data: IR-Raman, solid state NMR, elemental analysis, crystal structures of other polymorphs
I
conformational analysis or conformers from crystal structures of
II
I n 6 1 -..-rm..+n. U l V l yawl IIG11y
and charge
I LI
energetics against crystal structures of other polymorphs < if available
information on space group and unit cell dimensions
t
@?)Parameterize force field if necessary
1 I polymorph search
>- per space group
- per conformer
analysis of low energy, high density structures experimental powder pattern Rietveld refinement
prediction of physico-chemical solid state properties Figure 5
Flowchart describing the general polymorph prediction process.
flexibility in the molecule: the hydroxyl group has two possible orientations, both in the plane of the aromatic ring. The orientation of this group will have a profound influence on crystal structures, inasmuch as it plays an important role in the formation of hydrogen bonds. The barrier between the two conformations, however, is too high to be overcome during energy minimization. We
352 ComDuter Simulation to Predict Possible Crvstal PolvmorDhs
0 Estrone
HO
0
Acetaminophen
Benzene
NH2
Acetic acid
0
H Quinacridone
Figure 6 Structural formulas of estrone, acetaminophen, 4-amidinoindanone guanylhydrazone (AIGH), acetic acid, benzene, prednisolone tert-butylacetate, and quinacridone.
will therefore have to carry out separate prediction runs for both rotamers. The two rotamers are next optimized using quantum mechanical methods, and ESP charges are calculated for the optimized structures. We now have our starting models. The following step is to decide on the space groups to predict packing structures and the number of independent molecules in the cell. Since we have a chiral molecule, and the compound is optically pure, it can crystallize only in space groups that lack a mirror or inversion center. The most common ones are P2,, P2,2,2,, and PI. If powder diffraction patterns are available, those may be used to obtain cell parameters. For the f2, structure, such patterns would indicate a primitive cell with two right angles and a volume corresponding to four molecules in the cell, so one would start with predictions in P2,, Z’ = 2.
Application Examples 353 The powder patterns of the other polymorphs would indicate a primitive cell with all right angles and four molecules in the unit cell. In this case, the space group is most likely P 2 , 2 , 2 , with 2' = 1. Other possibilities exist (e.g., P2,2,2, 2' = 1)but are statistically less probable. At this stage, one could use solid state NMR data to determine the number of independent molecules in the cell, Z ' , which in all three cases would be in agreement with the most probable space group and Z' combinations. Next, crystal structures can be predicted with the program of choice. Taking the predictions for P2,2,2, as an example, we would obtain two sets of predicted structures, one for each starting conformation of the molecule. Because we have used different point charges in the calculation of these sets of structures, we cannot directly compare the MM energies of structures in different sets. To obtain energies that can be compared, one must use structures that have been optimized using the same point charges. This can be done by transferring the charges used for one conformer to the structures of the other conformer and minimizing those structures once more, or by calculating charges suitable for both conformers (the average of the charges calculated for the different conformers could be used) and minimizing all structures using these charges. Finally, powder diffraction patterns may be calculated for low energy structures, which can be compared to the experimentally obtained patterns. Simulated patterns for one of the experimentally observed estrone polymorphs and for the corresponding predicted structure are given in Figure 7a and 7b. A superposition of the two structures is given in Figure 8.
APPLICATION EXAMPLES In the literature, crystal structure predictions are presented on compounds with known crystal structures (for testing purposes) as well as on compounds for which no experimental data were available or for which only limited data exist (usually X-ray powder diffraction patterns). Rigid molecules composed of (aromatic) ring systems, as well as structures selected randomly from the CSD, are popular for testing purposes. If the molecules are flexible, usually the conformation observed in the true crystal structure is used in the prediction, which will bias the results. Exceptions include the crystal structure prediction for pigment red,65 predictions on monosaccharides, where six torsional angles are assumed unknown,85 and the 4-amidinoindanone guanylhydrazone example discussed below.llg Reports focusing on as yet unknown crystal structures, based on computer simulations, are still rare. One reason for their scarcity is that the field is relatively new; in addition, this type of work often takes place in an industrial environment, where the application examples remain confidential for some time. The sections that follow list a number of application examples. The structural formulas of the compounds involved are given in Figure 6.
354 Computer Simulation to Predict Possible Crystal Polymorphs (a)
Powder Dit fraction Radiation umed I XIUY Wavelength I 1.5418 estronlO
I
I
Powder Diffraction Radiation ummd = XRAY Wavelen th I 1.5418 Frame O f
80
I
n t e
9t
Y
60
40
h
20
10
15
20
25
Diffraction Angla
30
3
Figure 7 The X-ray powder diffraction patterns of one of the experimentally observed polymorphs of estrone (a) and of the corresponding predicted structure (b).
Application Examples 355
Figure 8 Superposition of an experimentally observed polymorph of estrone and the corresponding predicted structure. The ab initio prediction of the crystal packing of 4-amidinoindanone guanylhydrazone (AIGH) by Karfunkel et al.ll8 provides an example of the determination of a previously unknown crystal packing by polymorph prediction. This compound was proposed by scientists at Ciba-Geigy (Novartis) as an anticancer drug. Two solvent-free polymorphs, called A and B, were observed experimentally. The crystal structure of B could be solved by single crystal X-ray diffraction, but no crystals of sufficient quality could be grown for polymorph A. However, a powder diffraction pattern of A could be used to obtain reduced lattice parameters and the number of molecules in the unit cell, implicating two possibilities for the space group: PI, Z’ = 2, and PI, Z‘ = 1, the latter being most probable. The molecule may exist in two tautomeric forms. After energy calculations on different conformers of the two tautomers, four conformers of the most stable tautomer were selected for use in packing predictions in PI, using MSI’s Polymorph Predictor. In these calculations, the DREIDING-2.21 force field121 was used, with some corrections made to the parameters to permit reproduction of the optimized geometries obtained in ab initio calculations. The predicted crystal structure with lowest energy had cell parameters close to those derived from the powder diffraction pattern, and a satisfactory agreement between the observed and simulated powder patterns could be achieved using
Figure 9 Experimentally observed polymorphs of acetaminophen. Left: monoclinic form (CSD ref. code: HXACANOl). Right: orthorhombic form (CSD ref. code: HXACAN).
Amlication Examdes 357 Rietveld refinement. Thus, both the conformation of AIGH and the packing of the A polymorph, which were unknown before, were determined. Another example concerns the analgesic p-hydroxyacetanilide, better known as acetaminophen and marketed as Tylenol (in the United States) and Paracetamol (elsewhere). This substance has two anhydrous polymorphs. The monoclinic formy22 is known to be considerably more stable than the orthorhombic m 0 d i f i ~ a t i o n . lSuccessful ~~ crystallization of the latter metastable form has been reported only a few times. Both crystal structures (see Figure 9) involve a two-dimensional hydrogen-bonding motif in the lattice. A study was undertaken to determine whether any other polymorphs of this compound could be expected as part of a validation project of MSI's Polymorph Predictor. The polymorph search was performed in 17 space groups (together accounting for more than 95% of known molecular crystal structures) with one molecule in the asymmetric unit. Both experimentally known crystal structures were predicted in the correct stability order. The search, using the DREIDING-2.21 force field in combination with semiempirical MNDO-ESP atomic charges, also suggested a third potential polymorph in the P212121 space group to be about as stable as the metastable orthorhombic form. However, closer inspection of that structure revealed a distinct hydrogen-bonding pattern, which is incorrectly favored by the DREIDING force field. Recalculation of the lattice energetics with a force field more suitable for distinguishing subtle differences in hydrogen-bonding patterns ( CFF114) established that this third structure is actually too unstable to exist. applied the Polymorph Predictor to acetic acid and Recently, Payne et halogenated analogs thereof. As starting molecular geometries, they used either the known, experimental conformation, the STO-3G optimized structure, or the 6-31G" *89 optimized structure. For each geometry, 6-3lG" " ESP atomic charges were calculated and used in the predictions. In all cases, a packing was found that corresponded to the energy-minimized crystal structure, as well as several packings with an energy even lower (within 1 kcaUmo1). The authors attributed this apparent error in relative energies of different packings to shortcomings in the DREIDING-2.21 force field, specifically in its hydrogen bond potential. One of their predicted structures in P2,lc could be refined reasonably well to agree with the X-ray powder diffraction pattern of a high pressure form of acetic acid reported by Bertie and W i l t ~ n for , ~ which ~ ~ no experimental crystal structure is known. Thus, the authors78 were able to construct a model that is a good representation of the crystal structure, in the absence of single crystal diffraction data. Benzene is a popular validation case for crystal structure prediction metho d ~ . ~Two~ polymorphic , ~ ~ forms > ~ of~the~compound ~ ~ ~are known: a stable one with Z = 4126and a high pressure, metastable form with Z = 2.127 When MSI's polymorph predictor is used with the DREIDING-2.21 force field and atomic charges of -0.15 on carbon atoms and +0.15 on hydrogen atoms, both polymorphs can easily be p r e d i ~ t e dBecause .~~ the simulation does not consider pressure, the unit cell vectors of the high pressure polymorph are predicted too long. It is also possible to predict both forms in one single simulation by per-
358 Combuter Simulation to Predict Possible Crvstal Polvmorbhs forming a search in space group PI, with Z = 4.Note, however, that such a prediction with multiple molecules in the asymmetric unit is feasible only for simple, highly symmetrical compounds like benzene. For asymmetrical compounds, such simulations can prove unfeasible because of the large number of degrees of freedom due to the additional molecules in the asymmetric unit. The steroid prednisolone-t-butylacetate was patented as a glucocorticoid by Merck & Company in 1962. Two anhydrous polymorphs and five solventcontaining “pseudopolymorphs” are known,128 but the crystal structure of one anhydrous form has never been determined experimentally. Indexing of the experimental powder pattern of this elusive polymorph indicates that the crystal belongs to the P2,2,2, space group. Simulations with the MSI polymorph predictor considering two of the low energy conformations of the compound and using the DREIDING-2.21 force field with MNDO-ESP atomic charges find a stable P2,2,2, polymorph with a simulated powder pattern very similar to the experimental pattern.129 Subsequent Rietveld refinement confirmed that this predicted structure is the previously undetermined anhydrous polymorph of prednisolone. Quinacridone and its derivatives represent one of the most important classes of organic pigments, both in terms of annual production and in terms of wide ranging applications. The excellent performance of these pigments is explained by their thermal stability, weather resistance, and 1ightfa~tness.l~~ The parent compound quinacridone can crystallize in at least three polymorphs,131 each having a different color and a different set of application properties. Although the experimental crystal structure of the most stable y polymorph has been reported recently,132 rational control over the system is still impeded by a lack of knowledge of the crystal structures of the (Y and p forms, The MSI polymorph predictor (with the DREIDING-2.21 force field and ab initio 6-3PG** ESP atomic charges) predicted all possible crystal forms of quinacridone. 133 Powder patterns simulated for the three most stable predicted structures compare well with experimental powder patterns recorded for the three polymorphs of quinacridone. Rietveld refinement successfully refined the predicted structures, proving that the previously undetermined (Y and p polymorphs are now solved.
ACKNOWLEDGMENTS We acknowledge the support of this work by the Computational Materials Science Crystallization project, a Dutch research collaboration with academic and industrial partners, focusing on precompetitive research into modeling, packing, morphology, and industrial crystallization of organic compounds. Project information is accessible at http://www.caos.kun.nl/cmsc/.The industrial partners contributing to the project funding are Akzo-Nobel, Organon, Unilever, and DSM. Additional funding support is obtained from the Netherlands Organization for Scientific Research (NWO) and the Netherlands Foundation for Chemical Research (SON).
References 359
REFERENCES 1. J. D. Dunitz and J. Bernstein, Acc. Chem. Res., 28, 193 (1995).Disappearing Polymorphs. 2. G. H. Stout and L. H. Jensen, X-Ray Structure Determination, Macmillan, New York, 1970. (a) Chapter 3 . (b) Chapter 2. 3. A. I. Kitaigorodskii, Organic Chemical Crystallography, Consultants Bureau, New York, 1961. 4. D. E. Williams, Acta Crystallogr., 21, 340 (1966).Crystal Structure of Dibenzoylmethane. 5. D. E. Williams, Trans. Am. Cryst. Assoc., 6,21 (1970). Computer Calculation of the Structure and Physical Properties of the Crystalline Hydrocarbons. 6. D. E. Williams, Acta Crystallogr., Sect. A, 25,464 (1969).A Method of Calculating Molecular Crystal Structures. 7 . D. E. Williams, Actu Crystallogr., Sect. B, 29, 96 (1973). Crystal Structure of 2,4,6-Triphenylverdazyl. 8. P. Zugenmaier and A. Sarko, Acta Crystallogr., Sect. B, 28, 3158 (1972). Packing Analysis of Carbohydrates and Polysaccharides. 1. Monosaccharides. 9. C. P. Brock and J. A. Ibers, Acta Crystallogr., Sect. B, 29, 2426 (1973). Conformational Analysis of the Triphenylphosphine Molecule in the Free and Solid States. 10. W. R. Busing, Acta Crystallogr., Sect. A, 28, S252 (1972).A Computer Program to Aid in the Understanding of Interatomic Forces in Molecules and Crystals. 11. A. Kvick and J. H. Noordik, Acta Crystullogr., Sect. B, 33, 2862 (1977). Hydrogen Bond Studies. CXXI. Structure Determination of 2-Amino-4methylpyridine by Molecular Packing Analysis and X-Ray Diffraction. 12. T. Hahn, Ed., International Tables for Crystallography, Reidel, Dordrecht, 1983, Vol. A, pp. 734-744. 13. Y. Le Page, J. Appl. Crystallogr., 15, 255 (1982). The Derivation of the Axes of the Conventional Unit Cell from the Dimensions of the Buerger-Reduced Cell. 14. L. C. Andrews and H. J. Bernstein, Actu Crystallogr., Sect. A, 44,1009 (1988).Lattices and Reduced Cells as Points in 6-Space and Selection of Bravais Lattice Type by Projections. 15. L. Zuo, J. Muller, M.-J. Philippe, and C. Esling, Acta Crystallogr., Sect. A, 51, 943 (1995). A Refined Algorithm for the Reduced-Cell Determination. 16. A Gavezzotti and G . Filippini, J. Am. Chem. Soc., 117, 12299 (1995).Polymorphic Forms of Organic Crystals at Room Conditions: Thermodynamic and Structural Implications. 17. K. D. Gibson and H. A. Scheraga, J. Phys. Chem., 99, 3765 (1995).Crystal Packing Without Symmetry Constraints. 2. Possible Crystal Packings of Benzene Obtained by Energy Minimization from Multiple Starts. 18. G. Filippini and C. M. Gramaccioli, Acta Crystallogr., Sect. B, 42,605 (1986).Thermal Motion Analysis in Tetraphenylmethane: A Lattice-Dynamical Approach. 19. I. Weissbuch, R. Popovitz-Biro, M. Lahav, and L. Leiserowitz, Acta Crystallogr., Sect. B , 51, 115 (1995).Understanding and Control of Nucleation, Habit, Dissolution and Structure of Two- and Three-Dimensional Crystals Using ‘Tailor-Made’ Auxiliaries. 20. R. J. Davey, S. J. Maginn, S. J. Andrews, S. N. Black, A. M. Buckley, D. Cottier, P. Dempsey, R. Plowman, J. E. Rout, D. R. Stanley, and A. Taylor, J. Chem. Soc., Faruduy Trans., 90, 1003 (1994).Morphology and Polymorphism in Molecular Crystals: Terephthalic Acid. 21. J. P. Bowen and N. L. Allinger, in Reviews in Computational Chemistry, K. B. Lipkowitz and D. B. Boyd, Eds., VCH Publishers, New York, 1991, Vol. 2, pp. 81-97. Molecular Mechanics: The Art and Science of Parameterization. 22. U. Dinur and A. T. Hagler, in Reviews in Computational Chemistry, K. B. Lipkowitz and D. B. Boyd, Eds., VCH Publishers, New York, 1991, Vol. 2, pp. 99-164. New Approaches to Empirical Force Fields.
360 Computer Simulation to Predict Possible Crystal Polymorphs 23, I. Pettersson and T. Liljefors, in Reviews in Computational Chemistry, K. B. Lipkowitz and D. B. Boyd, Eds., VCH Publishers, New York, 1996, Vol. 9, pp. 167-189. Molecular Mechanics Calculated Conformational Energies of Organic Molecules: A Comparison of Force Fields. 24. A. M. Chaka, R. Zaniewski, W. Youngs, C. Tessier, and G. Klopman, Acta Crystallogr., Sect. B, 52, 165 (1996). Predicting the Crystal Structure of Organic Molecular Materials. 25. F. H. Allen and 0. Kennard, Chem. Design Autom. News, 8,31 (1993). 3D Search and Research Using the Cambridge Structural Database. The URL is http://www. ccdc.cam.ac.uk/. 26. J. R. Holden, 2. Du, and H. Ammon, f. Comput. Chem., 14,422 (1993). Prediction of Possible Crystal Structures for C-, H-, N-, 0-,and F-Containing Organic Compounds. 27. M.C.Payne, M. P. Teter, D. C. Allan, T. A. Arias, and J. D. Joannopoulos, Rev. Mod. Phys., 64, 1045 (1992). Iterative Minimization Techniques for Ab Initio Total-Energy Calculations: Molecular Dynamics and Conjugate Gradients. 28. R. Dovesi, V. R. Saunders, C. Roetti, M. Caush, N. M. Harrison, R. Orlando, and E. Apra, Crystal-Electronic Structure of Periodic Systems, User Manual (1996). The URL is http://gservl .dl.ac.ukA’CSC/Softare/CRYSTALI. 29. R. Stumpf and M. Scheffler, Comput. Phys. Commun., 79, 447 (1994). Simultaneous Calculation of the Equilibrium Atomic Structure and its Electronic Ground State Using Density-Functional Theory. The URL is http://www.fhi-berlin.mpg.de/th/fhi96md/code.html. 30. D.W. M. Hofmann and T. Lengauer, Acta Crystallogr., Sect. A, 53, 225 (1997). A Discrete Algorithm for Crystal Structure Prediction of Organic Molecules. 31. J. Gasteiger and M. Marsili, Tetrahedron, 36, 3219 (1980). Iterative Partial Equalization of the Orbital Electronegativity-A Rapid Access to Atomic Charges. 32. A. K. Rappe and W. A. Goddard II1,f. Phys. Chem., 95,3358 (1991). Charge Equilibration for Molecular Dynamics Simulations. 33. R. S. Mulliken, J. Chem. Phys., 23, 1833 (1955). Electronic Population Analysis on LCAO-MO (Linear Combination of Atomic Orbitals-Molecular Orbital) Molecular Wave Functions. 34. F. A. Momany, f. Phys. Chem., 82, 592 (1978). Determination of Partial Atomic Charges from Ab lnitio Molecular Electrostatic Potentials. Application to Formamide, Methanol, and Formic Acid. 35. S. R. Cox and D. E. Williams,J. Comput. Chem., 2,304 (1981). Representation of the Molecular Electrostatic Potencia1 by a Net Atomic Charge Model. 36. D. E. Williams, in Reviews in Computational Chemistry, K. B. Lipkowitz and D. B. Boyd, Eds., VCH Publishers, New York, 1991, Vol. 2, pp. 219-271. Net Atomic Charge and Multipole Models for the Ab Initio Molecular Electric Potential. 37. S. M.Bachrach, in Reviews in Computational Chemistry, K. B. Lipkowitz and D. B. Boyd, Eds., VCH Publishers, New York, 1994, Vol. 5, pp. 171-227. Population Analysis and Electron Densities from Quantum Mechanics. 38. J. J. P. Stewart, MOPAC93, Fujitsu Limited, Tokyo, 1993. The URL is http://www.fujitsu.com/. 39. M. J. Frisch, G. W. Trucks, H. B. Schlegel, P. M. W. Gill, B. G. Johnson, M. A. Robb, J. R. Cheeseman, T. Keith, G. A. Petersson, J. A. Montgomery, K. Raghavachari, M. A. At-Laham, V. G. Zakrzewski, J. V. Ortiz, J. B. Foresman, J. Cioslowski, B. B. Stefanov, A. Nanayakkara, M. Challacombe, C. Y. Peng, P. Y. Ayala, W. Chen, M. W. Wong, J. L. Andres, E. S. Replogle, R. Gomperts, R. L. Martin, D. J. Fox, J. S. Brinkley, D. J. DeFrees, J. Baker, J. P. Stewart, M. Head-Gordon, C. Gonzalez, and J. A. Pople, Gaussian 94, Gaussian Inc., Pittsburgh, PA 1995. 40. M. W. Schmidt, K. K. Baldridge, J. A. Boatz, S. T. Elbert, M. S. Gordon, J. H. Jensen, S. Koseki, N. Matsunaga, K. A. Nguyen, S. Su, T. L. Windus, M. Dupuis, and J. A. Montgomery, f. Comput. Chem., 14, 1347 (1993). General Atomic and Molecular Electronic Structure System.
References 361 41. M. Guest, J. Kendrick, J. van Lenthe, K. Schoeffel, and P. Sherwood, GAMESS-UK Users Guide and Reference Manual. Computing for Science (CFS) Ltd., Daresbury Laboratory, UK, 1994. 42. G. Schaftenaar, MOLDEN, QCPE Bulletin, 12, (1992), Program No. 619, Quantum Chemistry Program Exchange, Indiana University, Bloomington, Indiana, USA. The URL for MOLDEN is http://w.caos.kun.n!l-schaft/molden/molden.html. See also http://qcpeS.chem. indiana.edulqcpe.htrn1. E-mail: [email protected]. 43. G. Schaftenaar and J. H. Noordik, J. Comput. Aided Mol. Design, submitted (1998). MOLDEN: A Pre- and Post-Processing Program for Molecular and Electronic Structures. 44. D. E. Williams, PDM93, Electrostatic Potential-Derived Charges and Multipoles, 1993. Department of Chemistry, University of Louisville, Louisville, KY 40292. E-mail: dew01 @xray5.chem.louisville.edu. 45. L. E. Chirlian and M. M. Francl,J. Comput. Chem., 8,894 (1987).Atomic Charges Derived from Electrostatic Potentials: A Detailed Study. 46. B. H. Besler, K. M. Merz, and P. A. Kollman, 1. Comput. Chem., 11, 431 (1990). Atomic Charges Derived from Semiempirical Methods. 47. C . M. Breneman and K. B. Wiberg,J. Comput. Chem., 11, 361 (1990). Determining AtomCentered Monopoles from Molecular Electrostatic Potentials. The Need for High Sampling Density in Formamide Conformational Analysis. 48. C. I. Bayly, P. Cieplak, W. D. Cornell, and P. A. Kollman, J. Phys. Chem., 97, 10269 (1993). A Wcll-Behaved Electrostatic Potential Based Method Using Charge Restraints for Deriving Atomic Charges: The RESP Model. 49. C. A. Reynolds, J. W. Essex, and W. G. Richards, J. Am. Chem. Soc., 114, 9075 (1992). Atomic Charges for Variable Molecular Conformations. 50. D. J. Willock, S. L. Price, M. Leslie, and C. R. A. Catlow,J. Comput. Chem., 16,628 (1995). The Relaxation of Molecular Crystal Structures Using a Distributed Multipole Electrostatic Model. 51. W. T. M. Mooij, B. P. van Eijck, S. P. Price, P. Verwer, and J. Kroon, 1. Comput. Chem., 19, 459 (1998).Crystal Structure Predictions for Acetic Acid. 52. F. J. J. Leusen, H. J. Bruins Slot, J. H. Noordik, A. D. van der Haest, H. Wynberg, and A. Bruggink, Red. Trav. Chim. Pays Bas, 111, 111 (1992).Towards a Rational Design of Resolving Agents. Part IV. Crystal Packing Analyses and Molecular Mechanics Calculations for Five Pairs of Diastereomeric Salts of Ephedrine and a Cyclic Phosphoric Acid. 53. K . D. Gibson and H. A. Scheraga, J. Phys. Chem., 99, 3752 (1995).Crystal Packing Without Symmetry Constraints. 1. Test of a New Algorithm for Determining Crystal Structures by Energy Minimization. 54. D. E. Williams, Acta Crystallogr., Sect. A, 27,452 (1971).Accelerated Convergence of Crystal-Lattice Potential Sums. 5 5 . N. Karasawa and W. A. Goddard 111,J. Phys. Chem., 93,7320 (1989).Acceleration of Convergence for Lattice Sums. 56. B. P. van Eijck and J. Kroon,]. Phys. Chem. B, 101, 1096 (1997).Coulomb Energy of Polar Crystals. 57. W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling, in Numerical RecipesThe Art of Scientific Computing, Cambridge University Press, Cambridge, 1987, pp. 274-334. Minimization or Maximization of Functions. 58. A. Gavezzotti, Acta Crystullogr., Sect. B, 52, 201 (1996). Polymorphism of 7-Dimethylaminocyclopenta[c]coumarin: Packing Analysis and Generation of Trial Crystal Structures. 59. D. E. Williams, Actu Crystallogr., Sect. A, 52, 326 (1996). Ab Initio Molecular Packing Analysis. 60. D. C. Sorescu, B. M. Rice, and D. L. Thompson, J. Phys. Chem. B, 101, 798 (1997).Intermolecular Potential for the Hexahydro-1,3,5-trinitro-1,3,5-S-triazineCrystal (RDX): A Crystal Packing, Monte Carlo and Molecular Dynamics Study.
362 Computer Simulation to Predict Possible Crystal Polymorphs 61. J. E. Dennis and R. B. Schnabel, Numerical Methods for Unconstrained Optimization and Nonlinear Equations, Prentice-Hall, Englewood Cliffs, NJ, 1983, pp. 167-217. 62. D. E. Williams, Chem. Phys. Lett., 192,538 ( 1 992). OREMWA Prediction of the Structure of Benzene Clusters: Transition from Subsidiary to Global Energy Minima. 63. T. Shoda and D. E. Williams,]. Mol. Struct. (THEOCHEM), 357,l (1995).Molecular Packing Analysis. Part 3. The Prediction of m-Nitroaniline Crystal Structure. 64. L. C. Andrews, H. J. Bernstein, and G. A. Pelletier, Acta Crystallogr., Sect. A, 36,248 (1980). A Perturbation Stable Cell Comparison Technique. 65. H. R. Karfunkel, B. Rohde, F. J. J. Leusen, R. J. Gdanitz, and G. Rihs, ]. Comput. Chem., 14, 1125 (1993). Continuous Similarity Measure Between Nonoverlapping X-Ray Powder Diagrams of Different Crystal Modifications. 66. CeriusZ User Guide, March 1997, Molecular Simulations Inc., 9685 Scranton Road, San Diego, CA. The URL is http://www.msi.com/. 67. B. P. van Eijck and J. Kroon, ]. Comput. Chem., 18, 1036 ( 1 997). Fast Clustering of Equivalent Structures in Crystal Structure Prediction. 68. A. V. Dzyabchenko, Acta Crystallogr., Sect. B, 50,414 (1994).Method of Crystal-Structure Similarity Searching. 69. D. E. Williams, Acta Crystallogr., Sect. A, 36, 715 (1980). Calculated Energy and Conformation of Clusters of Benzene Molecules and Their Relationship to Crystalline Benzene. 70. B. W. van de Waal, Acta Crystallogr., Sect. A, 37,762 (1981).Significance of Calculated Cluster Conformations of Benzene: Comment on a Publication by D. E. Williams. 71. S. Oikawa, M. Tsuda, H. Kato, and T. Urabe, Acta Crystallogr., Sect. B, 41, 437 (1985). Growth Mechanism of Benzene Clusters and Crystalline Benzene. 72. A. Gavezzotti, PROMET3, A Program for the Generation of Possible Crystal Structures from the Molecular Structure of Organic Compounds, 1994. Available on request from A. Gavezzotti, University of Milan, Via Veneziano 21, 1-20133 Milan, Italy. E-mail: gave @stinch12.csmtbo.mi.cnr.it. 73. J. Perlstein, ]. Am. Chem. Soc., 114, 1955 (1992). Molecular Self-Assemblies: Monte Carlo Prediction for the Structure of the One-Dimensional Translation Aggregate. 74. J. Perlstein, J. Am. Chem. Soc., 116,455 (1994).Molecular Self-Assemblies.2. A Computational Method for the Prediction of the Structure of One-Dimensional Screw, Glide, and Inversion Molecular Aggregates and Implications for the Packing of Molecules in Monolayers and Crystals. 75. J. Perlstein, Chem. Mater. 6, 319 (1994).Molecular Self-Assemblies. 3. Quantitative Predictions for the Packing Geometry of Perylenedicarboximide Translation Aggregates and the Effects of Flexible End Groups. Implications for Monolayers and Three-Dimensional Crystal Structure Predictions. 76. J. Perlstein,]. Am. Chem. Soc., 116, 11420 (1994). Molecular Self-Assemblies. 4. Using Kitaigorodskii’s Aufbau Principle for Quantitatively Predicting the Packing Geometry of Semiflexible Organic Molecules in Translation Monolayer Aggregates. 77. J. Perlstein, K. Steppe, S. Vaday, and E. M. N. Ndip,]. Am. Chem. Soc., 118, 8433 (1996). Molecular Self-Assemblies, 5. Analysis of the Vector Properties of Hydrogen Bonding in Crystal Engineering. 78. R. S. Payne, R. J. Roberts, R. C. Rowe, and R. Docherty, 1. Comput. Chem., 19, 1 (1998). The Generation of Crystal Structures of Acetic Acid and Its Halogenated Analogues. 79. D. S. Coombes, G. K. Nagi, and S. L. Price, Chem. Phys. Lett., 265,532 (1997). On the Lack of Hydrogen Bonds in the Crystal Structure of Alloxan. 80. A. Gavezzotti,]. Am. Chem. Soc., 113, 4622 (1991). Generation of Possible Crystal Structures from the Molecular Structure for Low-Polarity Organic Compounds. 81. R. P. Scaringe and S. Perez,]. Phys. Chem., 91,2394 (1987).A Novel Method for Calculating the Structure of Small-Molecule Chains on Polymeric Templates.
References 363 82. N. L. Allinger, J. Am. Chem. SOC.,99, 8127 (1977). Conformational Analysis. 130. MM2. A Hydrocarbon Force Field Utilizing V, and V, Torsional Terms. 83. D. E. Williams, Program mpalmpg, Molecular Packing Analysis/Molecular Packing Graphics, 1996. Department of Chemistry, University of Louisville, Louisville, KY 40292. 84. N. Tajima, T. Tanaka, T. Arikawa, T. Sukarai, S. Teramae, and T. Hirano, Bull. Chem. Soc. Jpn., 68, 519 (1995). A Heuristic Molecular-Dynamics Approach for the Prediction of a Molecular Crystal Structure. 85. B. P. van Eijck, W. T. M. Mooij, and J. Kroon, Acta Crystullogr., Sect. B, 51, 99 (1995). Attempted Prediction of the Crystal Structures of Six Monosaccharides. 86. M. U. Schmidt and U. Englert,]. Chem. SOC.,Dalton Trans., 2077 (1996).Prediction of Crystal Structures. 87. W. R. Busing, WMIN, A Computer Program to Model Molecules and Crystals in Terms of Potential Energy Functions. Report ORNL-5747, 1981. Oak Ridge National Laboratory, Oak Ridge, TN 37831. E-mail: [email protected]. 88. R. J. Woods, in Reviews in Computational Chemistry, K. B. Lipkowitz and D. B. Boyd, Eds., VCH Publishers, New York, 1996, Vol. 9, pp. 129-165. The Application of Molecular Modeling Techniques to the Determination of Oligosaccharide Solution Conformations. 89. D. Feller and E. R. Davidson, in Reviews in Computational Chemistry, K. B. Lipkowitz and D. B. Boyd, Eds., VCH Publishers, New York, 1990, Vol. 1, pp. 1-44. Basis Sets for Ab Initio Molecular Orbital Calculations and Intermolecular Interactions. 90. W. F. van Gunsteren and H. J. C. Berendsen, Angew. Chem., Int. Ed. Engl., 29,992 (1990). Computer Simulation of Molecular Dynamics: Methodology, Applications, and Perspectives in Chemistry. 91. T. P. Lybrand, in Reviews in Computational Chemistry, K. B. Lipkowitz and D. B. Boyd, Eds., VCH Publishers, New York, 1990, Vol. 1, pp. 295-320. Computer Simulation of Biomolecular Systems Using Molecular Dynamics and Free Energy Perturbation Methods. 92. T. Arikawa, N. Tajima, S. Tsuzuki, K. Tanabe, and T. Hirano,J. Mol. Struct. (THEOCHEM), 339,115 (1995). A Possible Crystal Structure of 1,2-Dimethoxyethane: Prediction Based on a Lattice Variable Molecular Dynamics. 93. R. Hoffmann,]. Chem. Phys., 39, 1397 (1963). An Extended Hiickel Theory. I. Hydrocarbons./. Chem. Phys., 40,2745 (1964).Extended Hiickel Theory. 11. u Orbitals in the Azines. J. Chem. Phys., 40, 2474 (1964). Extended Huckel Theory. 111. Compounds of Boron and Nitrogen. J. Chem. Phys., 40,2480 (1964). Extended Hiickel Theory. IV. Carbonium Ions. 94. R. J. Gdanitz, Chem. Pbys. Lett., 190, 391 (1992). Prediction of Molecular Crystal Structures by Monte Carlo Simulated Annealing Without Reference to Diffraction Data. 95. H. R. Karfunkel and R. J. Gdanitz,]. Comput. Chem., 13,1171 (1992). Ab Initio Prediction of Possible Crystal Structures on the Basis of Molecular Information Only. 96. H. R. Karfunkel, F. J. J. Leusen, and R. J. Gdanitz, ]. Cornput.-Aided Muter. Design, 1, 177 (1993). The Ab Initio Prediction of Yet Unknown Molecular Crystal Structures by Solving the Crystal Packing Problem. 97. N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller, 1. Chem. Phys., 21, 1087 (1953). Equation of State Calculations by Fast Computing Machines. 98. H. R. Karfunkel and F. J. J. Leusen, Speedup, 6, 43 (1992). Practical Aspects of Predicting Possible Crystal Structures on the Basis of Molecular Information Only. 99. F. J. J. Leusen,J. Crystl. Growth, 166, 900 (1996). Ah Initio Prediction of Polymorphs. 100. D. E. Williams, PCK83. QCPE Program No. 548, 1983. Quantum Chemistry Program Exchange, Indiana University, Bloomington, IN 47405. 101. A. L. Spek, Acta Crystallogr., Sect. A, 46, C34 (1990). PLATON, An Integrated Tool for the Analysis of the Results of a Single Crystal Structure Determination. 102. V. L. Karen and A. D. Mighell, NIST*LATTICE, A Program to Analyze Lattice Relationships, Technical Note 1290, 1991. National Institute of Standards and Technology, Gaithersburg, MD 20899. E-mail: [email protected].
364 Comtmter Simulation to Predict Possible Crystal Polymorbhs 103. K. Mika, J. Hauck, and U. Funk-Kath, J . Appl. Crystullogr,, 27, 1052 (1994). Space-Group Recognition with the Modified Library Program ACMM. 104. Y. Le Page, D. D. Klug, and J. S. Tse, J . Appl. Crystullogr., 29, 503 (1996). Derivation of Conventional Crystallographic Descriptions of New Phases from Results of Ab-Initio Inorganic Structure Modelling. 105. Y. Le Page,]. Appl. Crystullogr., 20,264 (1987). Computer Derivation of the Symmetry Elements Implied in a Structure Description. 106. Y. Le Page,]. Appl. Crystullogr., 21, 983 (1988).MISSYM 1.1-A Flexible New Release. 107. M. Saunders, K. N. Houk, Y.-D. Wu, W. C. Still, M. Lipton, G. Chang, and W. C. Guida, J . Am. Chem. Soc., 112,1419 (1990). Conformations of Cycloheptadecane. A Comparison of Methods for Conformational Searching. 108. A. R. Leach, in Reviews in Computational Chemistry, K. B. Lipkowitz and D. B. Boyd, Eds., VCH Publishers, New York, 1991, Vol. 2, pp. 1-55. A Survey of Methods for Searching the Conformational Space of Small and Medium-Sized Molecules. 109. N. G. Hunt and F. E. Cohen, ]. Comput. Chem., 17,1857 (1996). Fast Lookup Tables for Interatomic Interactions. 110. P.-E. Werner, L. Eriksson, and M. Westdahl,]. Appl. Crystullogr., 18,367 (1985). TREOR, a Semi-Exhaustive Trial-and-Error Powder Indexing Program for All Symmetries. 111. A. Boultif and D. Louer,]. Appl. Crystullogr., 24, 987 (1991). Indexing of Powder Diffraction Patterns for Low-Symmetry Lattices by the Successive Dichotomy Method. 112. H. M. Rietveld, J. Appl. Crystullogr., 2, 65 (1969). A Profile Refinement Method for Nuclear and Magnetic Structures. 113. M. Orozco and F. J. Luque, J . Comput. Chem., 11, 909 (1990). On the Use of AM1 and MNDO Wave Functions to Compute Accurate Electrostatic Charges. 114. M. J. Hwang, T. P. Stockfisch, and A. T. Hagler, J. Am. Chem. Soc., 116, 2515 (1994). Derivation of Class I1 Force Fields. 2. Derivation and Characterization of a Class I1 Force Field, CFF93, for the Alkyl Functional Group and Alkane Molecules. 115. N. Padmaja, S. Ramakumar, and M. A. Viswamitra, Actu Crystullogr., Sect. A , 46, 725 (1990). Space-Group Frequencies of Proteins and of Organic Compounds with More than One Formula Unit in the Asymmetric Unit. 116. W. H. Baur and D. Kassner, Actu Crystullogr., Sect. B, 48,356 (1992).The Perils of Cc: Comparing the Frequencies of Falsely Assigned Space Groups with Their General Population. 117. A. Gavezzotti and G. Filippini,J. Phys. Chem., 98,4831 (1994). Geometry of the Intermolecular X-H..Y (X, Y = N, 0 )Hydrogen Bond and the Calibration of Empirical Hydrogen-Bond Potentials. 118. H. R. Karfunkel, Z. J. Wu, A. Burkhard, G. Rihs, D. Sinnreich, H. M. Buerger, and J. Stanek, Actu Crystullogr., Sect. B, 52, 555 (1996). Crystal Packing Calculations and Rietveld Refinement in Elucidating the Crystal Structure of Two Modifications of 4-Amidinoindanone Guanylhydrazone. 119. T. D. J. Debaerdernaeker, Cryst. Struct. Commun., 1, 39 (1972). 3-Hydroxyestra-1,3,5(10)trien-17-one (Estrone), C,,H,,O,. 120. B. Busetta, C. Courseille, and M. Hospital, Actu Crystullogr., Sect. B, 29,298 (1973). Structures Cristallines et Moliculaires de Trois Formes Polymorphes de I'Oestrone. 121. S. L. Mayo, B. D. Olafson, and W. A. Goddard HI,]. Phys. Chem., 94, 8897 (1990).DREIDING: A Generic Force Field for Molecular Simulations. 122. M. Haisa, S. Kashino, R. Kaiwa, and H. Maeda, Actu Crystullogr., Sect. B, 32,1283 (1976). The Monoclinic Form of p-Hydroxyacetanilide. 123. M. Haisa, S. Kashino, and H. Maeda, Actu Crystullogr., Sect. B, 30,2510 (1974). The Orthorhombic Form of p-Hydroxyacetanilide. 124. J. E. Bertie and R. W. Wilton,J. Chem. Phys., 75, 1639 (1981). Acetic Acid Under Pressure: The Formation Below 0 "C, X-Ray Powder Diffraction Pattern, and Far-Infrared Absorption Spectrum of Phase 11.
References 365 125. T. Shoda, K. Yamahara, K. Okazaki, and D. E. Williams, J. Mol. S t r u t . (THEOCHEM), 333,267 (1995).Molecular Packing Analysis of Benzene Crystals. Part 2. Prediction of Experimental Crystal Structure Polymorphs at Low and High Pressure. 126. C . E. Weir, G. J. Piermarini, and S. Block,]. Chem. Phys., 50,2089 (1969).Crystallography of Some High-pressure Forms of C,H,, CS,, Br,, CCI,, and KNO,. 127. R. Fourme, D. Andrk, and M. Renaud, Actu Crystallogr., Sect. B, 27,1275 (1 971). A Redetermination and Group-Refinement of the Molecular Packing of Benzene I1 at 25 kilobars. 128. S. R. Byrn, P. A. Sutton, B. Tobias, J. Frye, and P. Main,]. Am. Chem. SOC., 110, 1609 (1988). The Crystal Structure, Solid-state NMR Spectra, and Oxygen Reactivity of Five Crystal Forms of Prednisolone tert-Butylacetate. 129. F. J. J. Leusen, unpublished results. 130. K. J. North, in Pigment Handbook, P. A. Lewis, Ed., Wiley, New York, New York, 1988, Vol. 1, Section 1-D-j. 131. C. W. Manger and W. S. Struve, U.S. Patent 2,844,581 (1958);W. S. Struve, U.S. Patent 2,844,485 (1959). 132. G. D. Potts, W. Jones, J. F. Bullock, S. J. Andrews, and S. J. Maginn, 1. Chem. Soc., Chem. Commtln., 2565 (1994).The Crystal Structure of Quinacridone: An Archetypal Pigment. 133. F. J. J. Leusen, M . R. S. Pinches, N. E. Austin, S. J. Maginn, R. Lovell, H. R. Karfunkel, and E. F. Paulus, to be published.
CHAPTER 8
Computational Chemistry in France: A Historical Survey Jean-Louis Rivail and Bernard Maigret Laboratoire de Chimie the'orique, Unite' Mixte de Recherche au Centre National de la Recherche Scientifique (CNRS)N o 7565, Institut Nance'ien de Chimie mole'culaire, Universite' Henri Poincare', Nancy 1, Domaine Universitaire Victor Grignard, B.P. 239, 54506 Vandmuvre-lt?s-Nancy,France
INTRODUCTION The development of stored program digital computers has been perceived as a major event by many theoretical chemists all over the world. Therefore, the chronology analyzed in the case of the United States1 may apply to the situation in France, provided some adjustments, mainly with respect to a difference in aggregate computing power, are made to compensate for the size of our country. Computational chemistry started with early attempts to integrate the Schrodinger equation for molecules. This activity is, of course, still flourishing and gives rise to intensive numerical calculations. As time progressed, other applications appeared. These include statistical computations by means of Monte Carlo or molecular dynamics simulations, in which the thought process is different from that in quantum chemistry because a simulation tracks a large number of events designed to mimic what happens at the microscopic level in a macroscopic sample. This kind of approach can be classified as "numerical exReviews in Computational Chemistry, Volume 12 Kenny B. Lipkowitz and Donald B. Boyd, Editors Wiley-VCH, John Wiley and Sons, Inc., New York, 0 1998
367
368 Comoutational Chemistrv in France: A Historical Survey periments.” A third field in which computers became an indispensable tool for chemists is that of information processing. This is another important aspect of computers and computational science, which are expressed by the French words ordinateurs and informatique, respectively. This short historical essay starts by providing some details on the beginning of computational quantum chemistry in France after World War 11. We then consider the computational aspects of statistical mechanics. The main contributions of French scientists to the development of specialized software are briefly recounted. Processing chemical information is then mentioned, followed by a description of the evolution of computing facilities available to our chemists; the extent of government funding of the field is mentioned, as well. The importance of the facts surrounding the roots of computational chemistry in France can be safely evaluated from the perspective of some distance in time. We do not consider ourselves capable of providing an objective opinion on what happened during the past ten years or so. Accordingly, our bibliography is strongly focused on early writings, which may not be known by younger readers. The few references we selected deal with seminal papers which, according to French tradition, were often published in French, either in the Comptes Rendus de I’Acadkmie des Sciences (C.R. Acad. Sci. Paris) or the lournal de Chimie Physique. In general the important work was followed by other papers or reviews published in ;rarious specialized journals, and often in English.
EARLY AGE OF THEORETICAL CHEMISTRY In France, before World War 11, the application of wave mechanics to understanding the structure of matter was first a subject for physicists dealing with the electronic structure of atoms. In the 1930s, if one excepts Louis de Broglie, who spent most of his life working on the interpretation of quantum mechanics, the most prominent French scientist in the field of electronic structure of atoms was Lion Brillouin. Interest in molecules, and chemical implications, came later. The first assignment of the term “theoretical chemistry” can be found in a laboratory called “Centre de Chimie thtorique de France,” founded in Paris in 1943 by Raymond Daudel under the patronage of Louis de Broglie and Irtne and FrtdCric Joliot-Curie. This laboratory obtained official acknowledgment in 1948 when the Centre National de la Recherche Scientifique (CNRS)started to support its activities. Theoretical chemistry was officially recognized as a branch of science in France a half-century ago. In April 1948 an international symposium organized in Paris under the auspices of the CNRS and the Rockefeller Foundation offered French chemists the opportunity to interact with the world leaders in this new science, including C. A. Coulson, J. A. A. Ketelaar, H. C. Longuet-Higgins, and R. S. Mulliken. In addition, the first chair entitled “theoretical chemistry”
Early Age of Theoretical Chemistry 369 was created in October 1948 at the University of Nancy, and its first titular was Jean Barriol, whose early works dealt with molecular quantum mechanics from a rather basic point of view (group theory). Afterward Barriol became involved with the study of electric polarization effects on single molecules and, later, on liquids from both theoretical and experimental points of view. This growing interest in theoretical chemistry became quite visible during the 1950s. Bernard Pullman, who started his career in a CNRS position, was offered a professorship in quantum chemistry by the Sorbonne in 1954. His reputation was already well established on the basis of his early works, mainly devoted to the properties of welectron systems, and embodied in the book Les the‘ories dectroniques de la chimie organique. The book, which was written in collaboration with his wife, Alberte Pullman, and published in 1953, can be considered as another founding event of theoretical chemistry in France. At the same time, several other universities invited a theoretical chemist to join their faculty. Thus Andrt Julg moved to Marseille in 1957, and in 1958 Bordeaux created a chair of theoretical chemistry for Jean Hoarau. Following these institutions were Rennes (for Claude Gutrillot) and Pau (for Jean Deschamps). The other major events of this period are the transformation, in 1957, of Daudel’s Centre de Chimie thtorique de France into a CNRS research center called Centre de Mtcanique Ondulatoire Appliqute (CMOA) and the foundation in 1958 of a laboratory of theoretical biochemistry by Bernard and Alberte Pullman, resulting in the move of the Pullmans to the Institut de Biologie Physicochimique in Paris. The CMOA moved in 1962 into a vast building, north of Paris, in which the CNRS installed a CDC 7600 multipurpose computer, devoted to both atomic and molecular computations. During the same period, a new quantum chemistry group was founded at the &ole Normale Superieure, under the supervision of Josiane Serre. The situation then remained stable, not evolving for several years, so that the list of French universities having a group active in the field of theoretical chemistry was still rather limited. This situation can be explained by noting that the various departments of chemistry were composed exclusively of experimentalists, chemists who considered the development of their own disciplines to be more important and did not expect much from theory, which they regarded as some kind of “icing on the cake.” This situation, which may have been observed in other countries, was particularly strong in France because of old academic traditions dating from the time when the leaders of French chemistry did not accept the atomic theory. As a consequence, activity in theoretical chemistry was above a critical threshold only in Paris, where thanks to Raymond Daudel and the Pullmans, quite an active intellectual life developed. The vitality of these groups is evident in their scientific production and also in two books that played an important, worldwide role in promoting computational quantum chemistry: Quantum Chemistry,Methods and Applications, published in 1959 by Raymond Daudel, Roland Lefebvre, and Carl Moser,2 and Quantum Biochemistry, published in 1963 by Bernard and Alberte P ~ I l m a n . ~
370 Combutational Chemistrv in France: A Historical Survev
A community started to develop as a result of the monthly seminars of the CMOA. In addition, the summer schools organized in Menton, in the South of France, soon became an important international meeting place and, from a French point of view, played an important role in helping the small minority of theoretical chemists to take part in the worldwide adventure of their discipline. Fortunately the situation was not as stark as it appears if one looks only at the number of academic positions at that time. The CNRS, which created its own hierarchy for full-time research scientists (equivalent to the hierarchy of university professors), offered such positions to many theoretical chemists, thereby expanding the number of theoretical chemists active in France. Indeed, if one limits the list to our now retired colleagues, one should remember that Alberte Pullman, Gaston Berthier, Odilon Chalvet, Carl Moser, Roland Lefebvre, and Alain Veillard spent their full careers with the CNRS. as~ The main characteristic of French theoretical chemistry of the 1 9 5 0 ~ elsewhere in the world for the most part, is that the computations were performed by hand with a desktop mechanical calculator. The self-consistent field (SCF) computation on the IT system of azulene (10 electrons) by Andrk Julg is a typical example of the work done by the pioneers. After having computed the 4500 integrals required for this system, Julg started the SCF iterations, which exhibited a strong divergence that resisted the standard numerical convergence recipes of the time. By looking at the results, he found a very efficient graphical method, which led him to the solution before his competitors, who were using a c ~ m p u t e rNevertheless, .~ his own estimate of the human time spent on this problem is more than 4000 hours! During these early years, the French theoretical chemists played an important part in the applications of quantum mechanics to chemistry as described in many papers. Among the most original contributions, one may select a few topics such as: the relationship between the electronic structure of aromatic compounds and carcinogenicity5 the so-called “lodge” theory, which is one of the early attempts to analyze electronic density of atoms and molecules on the basis of information theory6 an application of London (gauge-invariant) atomic orbitals (GIAO) to the computation of molecular magnetic susceptibility’ the first unrestricted Hartree-Fock computations* an improved SCF-LCAO method for IT electron^.^
COMPUTATIONAL QUANTUM CHEMISTRY Modern quantum chemistry is strongly dependent on time-consuming computations, so the introduction of the first computers, at the end of the
Computational Quantum Chemistry 371 1950s, initiated a new era in theoretical chemistry. The first all-electron valence bond calculation was performed on the hydrogen fluoride (HF) molecule in 1953.1° The first all-electron SCF studies using molecular orbitals (MO) represented by linear combinations of atomic orbitals (LCAO),still on diatomics, came later.l13l2 Finally the first ab initio computation using a Gaussian-type function basis set on a polyatomic molecule, hydrazine, was published in 1966.13 The general tendency that soon appeared in the French community, as e l ~ e w h e r e ,was ~ ? ~for ~ the theoretical chemists to split into two categories: the methodologists, who were trying to improve the accuracy and the efficiency of the methods, and those who were in contact with experiment. The methodologists were mainly concerned with electron correlation. Most of them stem from the Pullmans’ group and later from the kcole Normale SupCrieure, and, in both cases, Gaston Berthier played an important part in the development of the French school. In continuation of the early work of Brillouin,15 multiconfiguration computations became a permanent area of interest16 leading to the generalized Brillouin theorem.’’ These works are at the basis of multiconfiguration self-consistent field (MCSCF) methodologies that are still in use.18 In addition, the problem of electron correlation and the selection of the configurations in configuration interaction (CI) computations received an efficient solution in the form of a method with a long name: configuration interaction by perturbation with multiconfigurational zeroth-order wavefunction selected iteratively (CIPSI).19 The main promoters of this method were Jean-Paul Malrieu and Jean-Pierre Daudey. Malrieu moved from Paris to Toulouse in 1974, where he joined Philippe Durand’s group of theoretical physicists. Daudey joined them in 1978, and nowadays the Toulouse group is quite prosperous and active in the fields of electron correlation, pseudopotentials, and effective Hamiltonians. Similarly, Bernard LCvy emigrated from the kcole Normale SupCrieure to Orsay in 1985, where his group is mainly concerned with the accurate treatment of large (from a quantum chemist’s point of view) systems. The second category of quantum chemists has more members working in various fields. Chemical reactivity is, of course, one of the major subjects for chemists, and several groups soon took a leading position in the field.20 This is particularly true for the group that Lionel Salem founded when he settled in Orsay.21 He introduced an orbital analysis in chemical reactivity studies and initiated reaction dynamics studies. His group has spawned groups in Lyon (Bernard Bigot), Montpellier (Odile Eisenstein and Claude Leforestier), and Paris (Alain Sevin); but the theoretical chemistry laboratory in Orsay, now headed by Xavier Chapuisat, maintains its tradition of excellence. In 1968, Alain Veillard moved to Strasbourg, where he started a laboratory that soon became recognized for the study of transition metal compounds.22 Later, the laboratory, now directed by Elise Kochanski, broadened its interests to include the study of intermolecular interactions.
372 Computational Chemistry in France: A Historical Survey In Nancy, Jean Barrio1 retired in 1974. His successor, Jean-Louis Rivail, started the very early quantum chemical studies of solvated species,23 and when Bernard Maigret joined the group in 1991, the field of investigation was widened to include biomolecular systems. Among the early theoretical chemistry groups is the laboratory in Bordeaux, where Jean-Claude Rayez introduced reaction dynamics and kinetics studies in connection with experimentalists working on molecular beams. The group in Pau, under the direction of Alain Dargelos, is concerned with the computation of spectroscopic properties. Daudel's CMOA disappeared in 1984, but most of his co-workers moved into the UniversitC Pierre et Marie Curie, where Marcel Allavena founded a laboratory called DIM (dynamrque des interactions molkulaires). This group is now part of the Paris theoretical chemistry laboratory directed by Alain Sevin. In the field of theoretical biochemistry, the successes of Alberte and Bernard Pullman are now legendary. This group was truly monumental in the development of biomolecular computing in France, and the many major scientists trained in their laboratory were the seeds for the extensions of computational chemistry in our country. The tradition of this laboratory is maintained by one of these seeds, namely Richard Lavery, who focused on modeling nucleic a ~ i d s . ~But ~ , many ~ " other researchers who defended their Ph.D. theses in this laboratory continue to pursue their own research in the basic spirit of the Pullmans. Since about 1960, and especially after the pivotal publication of their book reporting simple Huckel (velectron) molecular orbital calculations on biom~lecules,~ the Pullmans saw their laboratory become one of the most attractive and creative centers for molecular computations. Their pioneering work and leadership were acknowledged by an extraordinary number of awards for their major contribution to the development and recognition of what is now called computational chemistry. Because they awakened scientists to the possibility that computations were possible and informative for drug molecules, Bernard and Alberte Pullman opened the door to the present state of the art in ligand design, quantitative structure-activity relationships (QSAR), and molecular simulations on biomolecules. Mention is made of a new field appearing in the late 1980s, theoretical astrochemistry, under the guidance of Yves Ellinger. The late Pierre Claverie was one of the most influential French researchers in the field of intermolecular interactions. His work dedicated to the foundation of molecular force fields and their links with quantum chemistry was truly vi~ionary.~~ Since . ~ ' about 1970, Claverie has pointed out the importance of taking polarization effects into account in molecular mechanics, and he proposed the concept of self-encased different levels of computations regarding solvent effects, thereby showing the road to the present development of hybrid quantum mechanics/molecular mechanics ( Q WMM)methods.
Software Development 373
STATISTICAL MECHANICS With the seminal work of Jean Yvon28 on the statistical mechanical treatment of liquids, one would have expected an original school in this field. This did not happen, probably because of the difficulties of the subject and the success of quantum mechanics. Hence, for years, the macroscopic properties of matter were far from the concern of theoretical chemists, except in rare cases.29 Following the advent of the method of Monte Carlo sampling, a decisive change occurred in the late 1960s. The work of Loup Verlet30 played an important part, together with the availability of increasingly powerful computational facilities, in the development of molecular dynamics and of simulations of molecular liquids. Verlet’s co-workers in Orsay are still active in this field. Some years later, Savo Bratos founded in Paris a laboratory in which the theoretical treatment of molecular liquids soon reached a high level.31 Nowadays, some important chemical problems are approached by means of computational statistical thermodynamic^.^^ In the meantime, the barriers between various theoretical fields have tended to vanish, and the interplay between statistical mechanics and quantum chemistry is becoming stronger and stronger, leading to a more comprehensive approach to chemical problems in condensed phases.
SOFTWARE DEVELOPMENT Among the outstanding contributions to computational quantum chemistry, we have mentioned the CIPSI p r ~ g r a m , ’which ~ allows efficient postHartree-Fock computations. A Hartree-Fock molecular orbital program, ASTERIX, adapted to vector and parallel computers, was developed in Strasb0u1-g.~~ Before these codes, which deal with high-level ab initio computations, some successful attempts were made to use semiempirical methods to solve some problems not tractable at a more rigorous level. The best example is probably the perturbation configuration interaction using localized orbitals (PCILO) method,34 which combines the simplicity of the CNDO (and INDO) approximations to a moderate configuration interaction with a basis of localized orbitals. This method gave a fair estimate of conformational energy changes. Thanks to this method, the Pullmans’ group produced a series of pioneering works in the conformational analysis of modest-sized biomolecules of many types. The discovery of the so-called C, conformation of dipeptides was a result.35 The first system of programs to allow a full analytic computation of energy derivatives and geometry optimization within the framework of any semi-
374 Computational Chemistry in France: A Historical Survey
empirical MO theory method was written in 1972 under the name of GEOM0.36 It has been further improved by including solvent effect simulations, giving rise to the GEOMOS pa~kage.~’ On the basis of a sound analysis of intermolecular interactions, performed by means of a quantum perturbational approach, Claverie derived a force field that could suitably represent intermolecular interactions.26 The electrostatic interactions are described by means of a distributed multipole analysis, and induction effects are taken into account. The force field sum of interactions between fragments completed ab initio (SIBFA)27originated from this study and was subsequently applied successfully to many biophysical problems. Also in the field of biomacromolecules, several powerful methods for conformational analysis and shape descriptors of nucleic acids24 and proteins25 have been recently developed by Lavery, who maintains the tradition of the theoretical biochemistry laboratory in Paris. Several groups have a special interest in chemical information handling, in particular the subjects of chemical structure storage and computer-assisted chemical synthesis. The most remarkable French contribution in this area is the DARC system developed by Jacques-fimile Dubois for molecular encoding and chemical information retrieval.38
COMPUTATIONALFACILITIES The 1950s were characterized by early uses of computers for solving chemical problems. The first mention of such results can be found in the papers of the Daudel and Pullman39 groups. These first computations were made possible because of access granted to those scientists by hardware companies (Bull, IBM). Nevertheless, during the second half of the 1950s, several academic institutions started purchasing their own computers (an IBM 604, soon replaced by a model 650, was installed at the University of Nancy in 1957).These local, multipurpose computing centers developed with the computer technology, but without offering the highest computational power of the moment, as available elsewhere. At the end of the 1960s, these facilities appeared insufficient to chemists, who then asked for the creation of a specialized center devoted to theoretical chemistry or, at least, to scientific computing. The CNRS director, Pierre Jacquinot, and his successor, Hubert Curien, favored the second solution and entrusted an astrophysicist, Janine Connes, with the creation of a national center for scientific computing. This facility, called Centre Inter-Regional de Calcul Electronique (CIRCE),was founded on January l, 1969. From its very beginning, the organization hosted another institution named the Centre Europken de Calculs Atomiques et Mokculaires (CECAM), an international project that organized workshops on computational chemistry and physics. In
Teaching Computational Chemistry 375 1993, CIRCE was replaced by another organization still in Orsay, Institut pour le Dkeloppement et la Recherche en Informatique Scientifique (IDRIS), and CECAM moved to Lyon. The first vectorial machine in France (a Cray 1 ) was bought by the Atomic Energy Agency (CEA) and opened to academics in 1981. In 1983, vector computing was organized independently from CIRCE, to be shared by several research institutions including the CNRS. This situation was changed with the termination of CIRCE and the creation of IDRIS, which now offers vector and parallel computing facilities (three Cray supercomputers: a C98, a C94, and a T3E). In the meantime, most of the local computing centers disappeared and at the behest of the Ministry of Education, another national center was founded in Montpellier, the Centre National Universitaire Sud de Calcul (CNUSC).The CNUSC is accessible from everywhere in the country by means of the Internet.
INDUSTRY The development of computational chemistry at French chemical and pharmaceutical companies is relatively recent. Only Roussel-Uclaf (Romainville) developed a real computational chemistry group in the 1970s, built around Gilles Moreau and N. Claude Cohen, in close collaboration with the Pullmans’ laboratory. The two Roussel-Uclaf workers developed, respectively, an original method for QSAR (the autocorrelation method) and the SCRIPT program for molecular modeling. Both these tools are still being used and developed inside the current organization of the company (Hoechst Marion Roussel). Rh8ne-Poulenc Rorer has groups of computational chemists in France, England, and the United States. Like most other pharmaceutical companies, however, Rh8ne-Poulenc makes use of commercial software as “black boxes,” and thus the pioneering efforts of French software developers are of little influence.
TEACHING COMPUTATIONAL CHEMISTRY Most of the curricula in chemistry or physical chemistry include at least an introduction to scientific computing. Specific applications to chemistry are taught in many universities. Computational chemistry-in French Chimie informatique, where “informatique” is an adjective-is considered as a specialty and appears at the predoctoral level. Since 1987, a national predoctoral program (Diplomed’Etudes Approfondies),called Chimie informatique et the‘orique,has been taught jointly in seven universities: Henri PoincarC (Nancy),Paris Sud (Orsay),Pierre et Marie Curie (Paris VI), Denis Diderot (Paris VII), Rennes I, Louis Pasteur (Stras-
3 76 Computational Chemistry in France: A Historical Survey
bourg), and Paul Sabatier (Toulouse). Its aim is to teach, at a research level, basic computational science, quantum chemistry, and molecular modeling; other applications of computers in chemistry are not excluded, however. For example, some students pursue work in diverse fields such as experimental control and data processing in nuclear magnetic resonance spectroscopy. So far, all the students, especially those who ended with a thesis in computational chemistry, have been able to find either academic or industrial positions.
GOVERNMENT FUNDING In France a large part of academic research (universities and the CNRS) is supported by the government. The early work in theoretical chemistry would not have been possible without full public funding. We have mentioned, for example, the crucial role played by the CNRS in the development of theoretical chemistry. Similarly, the computing centers, in particular CIRCE, CNUSC, and now IDRIS, are almost entirely funded by national bodies. In addition, some research programs intended for developing special aspects of computing have been launched by the CNRS. This has been the case for a program on chemical modeling, which was a joint CNRS-IBM initiative in 1988. More recently, a project was initiated on computer-aided synthesis, and another project, which is sponsored in part by the French Petroleum Institute (IFP), deals with quantum mechanics applied to heterogeneous catalysis. Finally, specialized programs dealing, for instance, with astrophysics and astrochemistry are also concerned with computing. Among the other governmental research institutions having some activity in the field of computational chemistry, we have mentioned CEA. The National Research Institute for Informatics and Automatics (INRIA) is another example. Both institutions collaborate with the CNRS and the universities.
CONCLUSION In half a century, the impact of computers in chemistry has developed to an extent that was probably difficult to predict when the first attempts were made. This situation is common to other sciences in France and elsewhere. Progress correlates with spectacular increases in computer performance. Obviously, the evolution is not finished, and, no doubt, some studies that are out of reach of today’s computers will be feasible on the hardware of the future. For the time being, the users of existing computers do not share much among themselves except the use of the machines. For the future, one may infer that some of them, in particular those who develop new codes, will benefit even
References 377 more than now by exchanging knowledge and solutions, mainly because the machines are becoming more and more complex, and the variety of problems handled by them is rapidly becoming more comprehensive in scope. This evolution is becoming apparent in many places; for instance, probably one of the most successful attempts at making connections between mathematicians, computer scientists, physicists, chemists, geologists, and other scientists is occurring at the Charles Hermite Center in Nancy.40 In this joint project involving local universities, the INRIA, and the CNRS, there is an intense scientific activity dealing with new computational strategies for intensive computation and modeling to run on a 64-processor Origin 2000 Silicon Graphics parallel computer. In the near future, improved software and optimal use of such modern computers will significantly enhance interdisciplinary cooperation.
ACKNOWLEDGMENTS The authors are grateful to their colleagues Gaston Berthier, Janine Connes, Raymond Daudel, Andre Julg, Jean Hoarau, and Alberte Pullman for providing useful information.
REFERENCES 1. J. D. Bolcer and R. B. Hermann, in Reviews in Computational Chemistry, K . B. Lipkowitz and D. B. Boyd, Eds., VCH Publishers, New York, 1994, Vol. 5, pp. 1-63. The Development of Computational Chemistry in the United States. 2. R. Daudel, R. Lefebvre, and C. Moser, Quantum Chemistry, Methods and Applications, Wiley-Interscience, New York, 1959. 3. B. Pullman and A. Pullman, Quantum Biochemistry, Wiley-Interscience, New York, 1963. 4. A. Julg, C.R. Acad. Sci. Paris, 239, 1498 (1954). Structure electronique de I’azulkne: Etude par la mithode du champ self-consistent. A. Julg,]. Chim. Phys., 52, 377 (1955).ktude dc I’azulene par la methode du champ moliculaire self-consistent. 5 . A. Pullman, C.R. Acad. Sci. Paris, 221, 140 (1945).Mise en evidence d’une liaison trks apte a I’addition (rigion K) chez certaines molecules cancerigknes. 6. R. Daudel, S. Odiot, and H. Brion, C.R. Acad. Sci. Paris. 238,458 (1954).La notion de loge et la signification giometrique de la notion de couche dans le cortege electronique des atomes. H. Brion, R. Daudel and S. Odiot,]. Chim. Phys., 51, 553 (1954). Theorie de la localisabilite des corpuscules. IV. Emploi de la notion de loge dans I’ktude des liaisons chimiques. 7. M. Mayot, G. Berthier, and 8.Pullman,/. Phys. Radium, 12, 652 (1951). Calcul quantique de I’anisotropie diamagnCtique des molicules organiques. I. La methode. G . Berthier, M. Mayot, and B. Pullman,]. Phys. Radium, 12,717 (1951). Calcul quantique de I’anisotropie diamagnetique des molecules organiques. 11. Principaux groupes d’hydrocarbures aromatiques. J. Hoarau,]. Chim. Phys., 57, 855 (1960). Calcul de l’anisotropie diamagnetique de quelques systemes graphitiques. 8. G. Berthier, C.R. Acad. Sci. Paris, 238,91 (1954). Extension de la mithode du champ moleculaire self-consistent a I’etude des etats a couches incompletes. G . Berthier, I. Chim. Phys.,
3 78 Cornbzftational Chemistrv in France: A Hzstorzcaf Survev
9. 10. 11. 12. 13. 14.
15. 16. 17. 18. 19.
20. 21.
22. 23.
51, 363 ( 1 954). Configurations tlectroniques incomplktes. Partie I. La methode du champ molPculaire self-consistent et I’kude des .&tatsb couches incomplktes. A. Julg, J . Chim. Phys., 58, 19 (1960). Traitement L.C.A.O. amtliort des molecules conjugutes. I. ThCorie gentrale. Applications aux hydrocarbures. D. Kastler, C.R. Acad. Sci. Paris, 236,1271 (1953).Thtorie quantique de la moltcule d’acide fluorhydrique. G. Berthier, Mol. Phys., 2, 225 (1959). A Self-consistent Field for the H, Molecule. H. Brion, C. Moser, and M. Yamazaki, J. Chem. Phys., 30, 673 (1959). Electronic Structure of Nitric Oxide. A. Veiiiard, Theor. Chim. Acta, 5 , 413 (1966). Quantum Mechanical Calculations on Barriers to Internal Rotation. I. Self-consistent Field Wavefunctions and Potential Energy Curves for the Hydrazine Molecule in the Gaussian Approximation. S. J. Smith and B. T. Sutcliffe, in Reviews in Computational Chemistry, K. B. Lipkowitz and D. B. Boyd, Eds., VCH Publishers, New York, 1997, Vol. 10, pp. 271-316. The Development of Computational Chemistry in the United Kingdom. See also, C. A. Coulson, Rev. Mod. Phys., 32, 170 (1960).Present State of Molecular Structure Calculations. L. Brillouin, Actualitb Scientifiques et Industrielles, Vol. 159, Hermann, Paris, 1934. Les champs “self-consistents” de Hartree et de Fock. R. Lefebvre, C.R. Acad. Sci. Paris, 237, 1158 (1953).Sur I’application de la mtthode d’interaction de configuration aux molecules. B. Ltvy and G. Berthier, Znt. J. Quantum Chem., 2,307 (1968).Generalized Brillouin Theorem for the Multiconfigurational SCF Method. B. Ltvy, Chem. Phys. Lett., 4, 17 (1969).Multi-configuration Self-consistent Wavefunctions of Formaldehyde. (This is probably the first MCSCF computation on a polyatomic molecule.) B. Huron, J. P. Malrieu, and P. Rancurel, J. Chem. Phys., 58, 5745 (1973).Iterative Perturbation Calculations of Ground and Excited State Energies from Multiconfigurational Zeroth-Order Wavefunctions. R. Daudel and 0. Chalvet, Colloques Znternationaux du CNRS (CNRS, Paris, 1958), Calcul des Fonctions d’Onde Moleculaires, No. 82, p. 389. Theorie du mecanisme des reactions. 111. Sur I’application de la Chimie Quantique ila determination des mecanismes de reaction. L. Salem, J. Am. Chem. Soc., 90, 543 (1968). Intermolecular Orbital Theory of the Interaction Between Conjugated systems. I. General Theory. L. Salem, J. Am. Chem. SOC., 90,553 (1968).Intermolecular Orbital Theory of the Interaction Between Conjugated Systems. 11. Thermal and Photochemical Cycloadditions. H. M. Gladney and A. Veillard, Phys. Rev., 180,385 (1969).A Limited Basis Set Hartree-Fock Theory of Ni;-. D. Rinaldi and J.-L. Rivail, Theor. Chim. Acta, 32, 57 (1973). Polarisabilitts molkculaires et effet diilectrique de milieu i I’ttat liquide. fitude thiorique de la moltcule d’eau et de ses dimkres.
24. R. Lavery and H. Sklenar, J. Biomol. Struct. Dyn., 6, 63 (1988). The Definition of Generalized Helicoidal Parameters and of Axis Curvature for Irregular Nucleic Acids. R. Lavery and H. Sklenar, J. Biomol. Struct. Dyn., 6,655 (1989).Defining the Structure of Irregular Nucleic Acids: Conventions and Principles. R. Lavery, in Unusual D N A Structures, R. D. Wells and S. C. Harvey, Eds., Springer-Verlag, New York, 1988, pp. 189-206. DNA Flexibility Under Control: the JUMNA Algorithm and Its Application to BZ Junctions. R. Lavery, K. Zakrzewska, and H. Sklenar, Comput. Pbys. Commun., 91,135 (1995).JUMNA: Junction Minimisation of Nucleic Acids. 25. H. Sklenar, C. Etchebest, and R. Lavery, Proteins: Struct., Funct., Genet., 6, 46 (1989). Describing Protein Structure: A General Algorithm Yielding Complete Helicoidal Parameters and a Unique Overall Axis. 26. M. J. Huron and P. Claverie, Chem. Phys. Lett., 4, 429 (1969). Practical Improvements for
References 379
27.
28. 29. 30. 31. 32.
33.
34.
35. 36.
the Calculation of Intermolecular Energies. M. J. Huron and P. Claverie, Chem. Phys. Lett., 9, 194 (1971).Study of Solute-Solvent Interactions. N. Gresh, P. Claverie, and A. Pullman, 1nt.j. Quantum Chem., 13,243 (1979).Intermolecular Interactions: Reproduction of the Results of Ab Initio Supermolecule Computations by an Additive Procedure. N. Gresh, P. Claverie, and A. Pullman, Theor. Chim. Acta, 66, 1 (1 984). Theoretical Studies of Molecular Conformation. Derivation of an Additive Procedure of the Computation of Intramolecular Interaction Energies. Comparison with Ab Initio SCF Computations. N. Gresh, A. Pullman, and P. Claverie, Theor. Chim. Acta, 67, 11 ( 1 985). Theoretical Studies of Molecular Conformation. 11. Application of the SIBFA Procedure to Molecules Containing Carbonyl and Carboxylate Oxygens and Amide Nitrogens. J. Yvon, Actualitis Scientifiques et lndustrielles, Vol. 203, Hermann, Paris 1935. La theorie statistique des fluides et I’iquation d’etat. J. L. Greffe and J. Barriol, C.R. Acad. Sci. Paris, 270C, 253 (1970). Contribution au calcul statistique du facteur g de Kirkwood pour des liquides polaires purs. L. Verlet, Phys. Rev., 159, 98 (1967).Computer “Experiments” on Classical Fluids. 1. Thermodynamical Properties of Lennard-Jones Molecules. L. Verlet, Phys. Rev. 165,201 (1968). Computer “Experiments” on Classical Fluids. 11. Equilibrium Correlation Functions. Y. Guissani, B. Guillot, and S. Bratos, 1. Chem. Phys., 88, 5850 (1988).The Statistical Mechanics of the Ionic Equilibrium of Water: A Computer Simulation Study. G. Wipff and L. Troxler, in Computational Approaches in Supramolecular Chemistry, G. Wipff, Ed., NATO AS1 Series, Kluwer, Amsterdam, 1994, pp. 319-348. MD Simulations on Synthetic Ionophores and Their Cation Complexes: Comparisons of AqueousNon-aqueous Solvents. R. Ernenwein, M.-M. Rohmer, and M. Benard, Comput. Phys. Commun., 58,305 ( 1 990). A Program System for Ab Initio MO Calculations on Vector and Parallel Processing Machines. I. Evaluation of Integrals. M.-M. Rohmer, J. Demuynck, M. Binard, R. Wiest, C. Bachmann, C. Henriet, and R. Ernenwein, Comput. Phys. Commun., 60, 127 (1990). A Program System for Ab Initio M O Calculations on Vector and Parallel Processing Machines. 11. SCF Closed-Shell and Open-Shell Iterations. R. Wiest, J. Demuynck, M. Binard, M.-M. Rohmer, and R. Ernenwein, Comput. Phys. Commun., 62,107 (1991). A Program System for Ab Initio MO Calculations on Vector and Parallel Processing Machines. 111. Integral Reordering and Four-Index Transformation. S. Diner, J. P. Malrieu, and P. Claverie, Theor. Chim. Acta, 13, 1 (1969). Localized Bond Orbitals and the Correlation Problem. I. Perturbation Calculation of the Ground-State Energy. J. P. Malrieu, P. Claverie, and S. Diner, Theor. Chzm. Acta, 13, 18 (1969). Localized Bond Orbitals and the Correlation Problem. 11. Application to .ir-Electron Systems. S. Diner, J. P. Malrieu, F. Jordan, and M. Gilbert, Theor. Chim. Acta, 15,100 (1969). Localized Bond Orbitals and the Correlation Problem. 111. Energy Up to the Third Order in the Zero-Differential Overlap Approximation. Application to u-Electron Systems. F. Jordan, M. Gilbert, J. P. Malrieu, and U. Pincelli, Theor. Chim. Acta, 15, 21 1 (1969).Localized Bond Orbitals and the Correlation Problem. IV. Stability of the Perturbation Energies with Respect to Bond Hybridization and Polarity. P. Claverie, J. P. Daudey, S. Diner, C. L. Giessner-Prettre, M. Gilbert, J. Langlet, J. P. Malrieu, U. Pincelli, and B. Pullman, Quantum Chemistry Program Exchange, Bloomington, IN. QCPE Program no 220, PCILO: Perturbation Configuration Interaction Using Localized Orbital Method in the CNDO Hypothesis. J. Langlet, J. P. Malrieu, J. Douady, Y. Ellinger, and R. Subra, Quantum Chemistry Program Exchange, Bloomington, IN. QCPE Program no 327, PCIRAD: The PCILO Method Extended to Localized Open Shell Systems. J. Douady, B. Barone, Y. Ellinger, and R. Subra, Quantum Chemistry Program Exchange, Bloomington, IN. QCPE Program no 371, PCILINDO: The PCILO Method in the INDO Approximation. B. Pullman and A. Pullman, Adv. Protein Chem., 28, 347 (1974).Molecular Orbital Calculations on the Conformation of Amino Acid Residues of Proteins. D. Rinaldi and J.-L. Rivail, C.R. Acad. Sci. Paris, 274C, 1664 (1972).Recherche rapide de la geometrie d’une molicule a I’aide des mithodes LCAO semiempiriques ne faisant intervenir que des intkgrales mono- et bicentriques. D. Rinaldi, Quantum Chemistry Program Ex-
380 Computational Chemistry in France: A Historical Survey
37. 38.
39. 40.
change, Bloomington, IN. QCPE Program no 290, GEOMO: A System of Programs for the Quantitative Determination of Molecular Geometries and Molecular Orbitals. D. Rinaldi, P. E. Hoggan, and A. Cartier, Quantum Chemistry Program Exchange, Bloomington, IN. QCPE Program no 584, GEOMOS: Semiempirical SCF System Dealing with Solvent Effects and Solid Surface Adsorption. J.-E. Dubois, D. Laurent, and H. Veillard, C.R. Acad. Sci. Paris, 263C, 764 ( 1 966). Systtme de documentation et de recherches de corrklations (DARC). Principes gdneraux. J.-E. Dubois, D. Laurent, and H. Viellard, C.R. Acad. Sci. Pa+, 263C, 1245 (1966). S y s t h e DARC. Description structurale et polymatricielle (DSP). Ecriture des matrices formelles. M. Mayot, H. Berthod, G. Berthier, and A. Pullman,J. Chim. Phys., 53, 774 (1956). Calcul des intkgrales polycentriques relatives 1 I’etude des structures moltculaires. 1. Intkgrale tricentrique homonucldaire du type Coulomb-kchange. Centre Charles Hermite. http://www.loria.fr/CCW
Reviews in Computational Chemistry, Volume12 Edited by Kenny B. Lipkowitz, Donald B. Boyd Copyright 0 1998 by Wiley-VCH, Inc.
Author Index Abell, G. C., 238 Abeygunawardana, C., 61 Abraham, F. E., 69 Abrarnowitz, M., 203 Ackers, G. K., 325 Adant, C., 274,278 Agren, H., 277,279 Aiga, F., 277 Akke, M., 61 Alagona, G., 323 Albert, I. D. L., 279 Alder, B. J., 61, 202, 237 Alexandrowicz, Z., 6 2 , 6 4 Al-Laham, M. A., 360 Allan, D. C., 360 Allen, A. T., 135 Allen, F. H., 360 Allen, M. P., 61, 135,202,203, 324 Allinger, N. L., 237,239, 323,359, 363 Allison, S. A., 326 Alonso, J. A., 201 Amadei, A., 325 Ammon, H., 360 Andersen, H. C., 136 Anderson, A. G., 63, 70, 71 Anderson, T. W., 323 Andr6, D., 365 Andres, J. L., 360 Andrews, L. C., 359,362 Andrews, S. J., 359, 365 Apra, E., 360 Aqvist, J., 69, 73, 325 Archontis, G., 71, 72 Arfken, G., 136 Arias, T. A., 360 Arikawa, T., 36 Ashcroft, N. W., 202 Ashwell, G. J., 275 Atkins, P. W., 275 Attard, P., 202 Auffinger, P., 202 Austin, N. E., 365
Avdeev, V. I., 199 AvilCs, F. X., 71 Ayala, P. Y., 360 Bachmann, C., 379 Bachrach, S. M., 360 Bader, J. S., 72,324 Baker, J., 360 Balbes, L. M., 237 Baldridge, K. K., 278,360 Balluffi, R. W., 239 Barabino, G., 201 Baranyai, A., 136 Barber, M., 203 Barford, R. A., 73 Barker, J. A., 66, 69 Barnes, A. J., 134 Barnes, P., 200 Barojas, J., 135 Barone, B., 379 Barriol, J., 379 Bartlett, R. J., 275, 276, 277, 278 Bartolotti, L. J., 201, 237 Basch, H., 205 Bascle, J., 62 Bash, P. A,, 69, 205 Baskes, M. I., 237, 239 Baums, D., 275 Baur, W. H., 364 Bayly, C. I., 326, 361 Baysal, C., 73 Beazley, D. M., 239 Becke, A. D., 278 Becker, J. M., 61 Beers, Y.,200 Belch, A. C., 67 Belhadj, M., 72, 325 Bell, C. D., 68 Bellemans, A., 134 BCnard, M., 379 Benjamin, I., 199,202,205 Ben-Naim, A., 64
381
382 Author Index Bennett, C., 64 Berard, D. R., 200,201,202,204 Berendsen, H. J. C., 66,69,70, 71,134, 136, 199,200,323,324,325,363 Berens, P. H., 136 Bereolos, P., 64 Berg, B. A., 62, 63, 74 Berkowitz, M., 67,201,202 Berkowitz, M. L., 202,203 Bernardo, D. N., 324 Berne, B. J., 67, 134, 200,205 Bernstein, H. J., 359, 362 Bernstein, J., 359 Berthier, G., 377, 378, 380 Berthod, H., 380 Bertie, J. E., 364 Bertolini, D., 200 Bertoni, C. M., 202 Besler, B. H., 361 Beutler, T. C., 67, 70, 71 Beveridge, D. L., 61,63, 67, 69,202 Bhanot, G., 74 Bienstock, R. J., 65 Binder, K., 61,66 Bishop, D. M., 274, 276,278 Biswas, R., 238 Bixon, M., 134 Bjork, A., 136 Black, S., 74 Black, S. N., 359 Blake, J. F., 71 Bleil, R. E., 323 Block, S., 365 Bloor, D., 275, 278 Boatz, J. A., 278, 360 Bocker, J., 200,201 Boczko, E. M., 67 Bogusz, S., 203 Bolcer, J. D., 377 Boles, J. O., 326 Bolger, M. B., 70 Boresch, S., 71 Borges, G. L., 204 Boris, J. P., 136 Born, M., 324 Boudon, S., 70 Boultif, A., 364 Bouzida, D., 67, 204 Bowen, J. P., 237,323,359 Bowles, R. K., 66 Boyd, D. B., v, vi, vii, 63, 64, 134, 135,201, 202,205,236,237,277, 323, 359,360, 363,364,377,378
Boyd, R. J., 324 Boyd, R. W., 275 Brady, J.%64 Bratos, S., 379 Bredas, J. L., 274,276,278 Bremi, T., 67 Breneman, C. M., 361 Brenner, D. W., 237,238, 239 Briels, W. J., 68 Brillouin, L., 378 Brinkley, J. S., 360 Brion, H., 377, 378 Brock, C. P., 359 Brodsky, A. M., 74,204 Brook, R. J., 275 Brooks, B. R., 64,65,203,323,326 Brooks, C. L., 111, 63,67, 68, 69, 70, 135, 136,323 Brown, B. C., 66 Brown, F. K., 69,70 Bruccoleri, R. E., 64,323 Bruggink, A., 361 Bruins Slot, H. J.%361 Briinger, A., 70, 325 Bruni, F., 200 Bruschweiler, R., 61 Buckingham, A. D., 275 Buckley, A., 275 Buckley, A. M., 359 Buckner, J. K., 68, 70 Buerger, H. M., 364 Bullock, J. F., 365 Burgess, A. W., 64 Burkert, U., 239 Burkhard, A., 364 Burland, D. M., 275 Busetta, B., 364 Busing, W. R., 359, 363 Byrn, S. R., 365 Cahn, W., 275 Calhoun, A., 199 Calvin, M. D., 199 Cammi, R., 279 Car, R., 237 Card, D. N., 74 Carlson, H. A., 71, 72, 73,325 Carlsson, A. E., 238 Carnie, S. L., 200 Carter, P., 74 Cartier, A,, 380 Case, D. A., 65, 323 Cassettari, M., 200
Author lndex 383 Catlow, C. R. A., 361 Causa, M., 360 Ceperley, D. M., 202,237 Chadi, D. J., 238 Chagas, C., 69 Chaka, A. M., 360 Challacombe, M., 360 Chalvet, O., 378 Champagne, B., 278 Chan, C. T., 238 Chandler, D., 72, 134, 324 Chandrasekhar, I., 136 Chandrasekhar, J., 68, 199, 324 Chang, G., 364 Chang, I., 62 Chao, K.-C., 64 Cheeseman, J. R., 360 Chen, S., 279 Chen, S.-w., 68 Chen, W., 360 Cheng, L.-T., 276 Chialvo, A. A., 324 Chin, S., 276, 278 Chirlian, L. E., 361 Choi, C., 135 Chu, Z.-T., 63, 70, 73 Ciccotti, G., 134, 135, 203 Cieplak, P., 70, 325,326, 361 Cioslowski, J., 236, 360 Clarke, J. H. R., 203 Claverie, P., 378, 379 Clementi, E., 277 Clough, S. A., 200 Cohen, D., 205 Cohen, F. E., 364 Cohen, H. D., 275 Colton, R. J., 238 Colwell, S. M., 278 Coombes, D. S., 362 Cornell, W. D., 326,361 Cossi, M., 279 Cottier, D., 359 Coulson, C. A., 200, 378 Courseille, C., 364 Covell, D. G., 69, 72 Cox, S . R., 326, 360 Cramer, C. J., 63 Cram&, H., 64 Cross, A. J., 69 Cross, P. C., 136 Cummings, P. T., 324 Cun-xin, W., 70 Cushman, J. H., 202
Cyrot-Lackmann, F., 238 Czerminski, R., 135 Dacol, D., 323 Dagani, R., 275 Daggett, V., 70 Dahlquist, G., 136 Dang, L. X., 67,68,70 Darden, T., 203 Dauber-Osguthorpe, P., 65 Daudel, R., 377, 378 Daudey, J. P., 379 Daura, X., 71 Davey, R. J., 359 Davidson, E. R., 236,277, 363 Davis, M. E., 326 Davis, T. F., 74 Daw, M. S., 237 Dawnkaski, E. J., 238 Day, P. N., 205 De Groot, B. L., 325 de Pablo, J. J., 64 Dean, P. M., 63 DeBaerdemaeker, T. D. J., 364 DeBolt, S . E., 136 Decius, J. C., 136 DeFrees, D. J., 360 Del Buono, G. S., 325 Dempsey, P., 359 Demuynck, J., 379 Dennis, J. E., 362 Denti, T. Z. M., 71,72 DePristo, A. E., 239 Dewar, M. J. S . , 276 di Bella, S., 279 DiCapua, F. M., 61 Dickson, R. M., 278 Diederich, F., 71, 72 Diestier, D. J., 202 Dieter, K. M., 275 Diner, S., 379 Ding, Y., 324 DiNola, A., 66 Dinur, U., 323, 359 Dirk, C. W., 276 Docherty, R., 362 Dorn, R., 275 Douady, J., 379 Dovesi, R., 360 Drabold, D. A., 238 Du, Z., 360 Dubois, J.-E., 380 Dudis, D. S., 277, 279
384 Author lndex Duffy, E. M., 72 Dunitz, J. D., 359 Dunlap, B. J., 278 Dunning, T. H., Jr., 276 Dupuis, M., 276, 277, 278, 360 Durell, S. R., 71 Dykstra, C. E., 275, 276 Dzyabchenko, A. V., 362 Eastwood, J. W., 136 Eaton, D. F., 275 Eck, B., 200 Edberg, R., 136 Edholm, O., 66 Eerden, J. V., 68 Eisenberg, D., 200 Eisenmenger, F., 74 Elber, R., 65, 135 Elbert, S. T., 278, 360 Elert, M. L., 238 Ellinger, Y., 379 Englert, U., 363 Eno, L,, 323 Ercolessi, F., 238 Erickson, B. W., 72 Eriksson, L., 364 Ernenwein, R., 379 Ernst, R. R., 6 7 Erpenbeck, J. J., 62 Escobedo, F. A., 64 Esling, C., 359 Esselnik, K., 66 Essex, J. W., 71, 361 Essmann, U., 203 Etchebest, C., 378 Evans, D. J., 135, 136,204 Evans, W. A. B., 135 Eyring, H., 237 Failor, B. H., 136 Fedders, P. A., 238 Feil, D., 68, 363 Feller, D., 236, 277 Feller, S. E., 203 Ferrante, J., 238 Ferrario, M., 135,203 Ferrenberg, A. M., 74 Fersht, A. R., 325 Fickett, W., 64 Field, M. J., 205 Filippini, G., 359, 364 Fincham, D., 135
Finney, J. L., 200 Finnis, M. W., 237 Fischer, J., 66 Fisher, G. B., 200 Fixman, M., 66, 134 Flannery, B. P., 203, 361 Flemings, M. C., 275 Flurchick, K., 201, 237 Flytzanis, C., 276 Foiles, S. M., 238 Foresman, J. B., 360 Forester, T. R., 136 Forsythe, G. E., 323 Foster, K., 201 Foulkes, W. M. C., 237 Fourme, R., 365 Fox, D. J., 360 Franck, P., 323 Francl, M. M., 361 Franken, P. K., 276 Fraternali, F., 68 Frauenheim, T., 238 Frenkel, D., 66,202,276 Friedman, R. A., 68 Friedman, R. S., 275 Friere, E., 325 Friesner, R. A., 66 Frisch, M. J., 360 Frye, J., 365 Fuhua, H., 70 Funk-Kath, U., 364 Galazka, W., 63 Gans, P. J., 62 Gao, J., 68, 70, 205 Garcia, A., 325 Gardner, A. A., 204 Garel, T., 6 2 Garmer, D., 205 Garrison, B. J., 238 Gasteiger, J., 360 Gavezzotti, A., 359,361, 362,364 Gavotti, C., 201 Gdanitz, R. j., 362, 363 Gear, C. W., 136 Gelin, B. R., 61,134 Genest, M., 65 Cerber, P. R., 70, 325 Gerhardts, R. R., 201 Ghio, C., 323 Gibbs, J. W., 64 Gibson, J. B., 237
Author Index 385 Gibson, K. D., 65, 359,361 Gierasch, L. M., 65 Gies, P., 202 Giessner-Prettre, C. L., 379 Gilbert, M., 379 Gill, P. M. W., 360 Gilson, M. K., 68 Gladney, H. M., 378 Gland, J. L., 200 Glosli, J .N., 204, 205 Go, M., 65 GO, N., 61, 65, 135, 326 Goddard, W. A., 111, 360, 361, 364 Golab, J. T., 205 Goland, A. N., 237 Goldstein, H., 135, 203 Gomperts, R., 360 Gonzalez, C., 360 Goodwin, P. D., 238 Gordon, J. G., 204 Gordon, M. S., 205,278, 360 Gosh, I., 69 Gough, C. A., 135 Grahame, D. C., 199 Gramaccioli, C. M., 359 Grassberger, P., 62 Green, C. D., 200 Green, S. M., 325 Greengard, L. F., 203 Greffe, J. L., 379 Gresh, N., 379 Grigera, J. R., 136, 200 Guirdia, E., 67 Gubbins, K. E., 64, 135 Guest, M., 361 Guida, W. C., 364 Guillot, B., 379 Guissani, Y., 379 Gurskii, Z., 201 Guttman, C. M., 62 Haak, J. R., 69 Hagler, A. T., 61, 62, 65, 323, 359, 364 Hahn, T., 359 Haisa, M., 364 Halley, J. W., 199, 201, 324 Hammersley, J. M., 64 Hammonds, K. D., 135 Handscombe, D. C., 64 Handy, N. C., 277 Hann, R. B., 278 Hansen, J.-P., 61, 66
Hansmann, U. H. E., 63,74 Hansson, T., 73 Hao, M.-H., 63,74 Harkema, S., 68 Harp, G. D., 134 Harris, J., 237 Harrison, J. A., 238, 239 Harrison, N. M., 360 Harvey, S. C., 64,68,323,378 Hawk, J., 364 Hautman, J., 203 Haydock, R., 237 Hayward, S., 325, 326 Head, J. D., 199 Head-Gordon, M., 360 Hegger, R., 62 Heine, V., 238 Heinzinger, K., 199,200,201,202,203 Helfand, E., 135 Henderson, D., 66, 69,201, 204 Hendrickson, T. F., 62 Henriet, C., 379 Hermann, R. B., 377 Hermans, J., 63, 69, 70, 71, 72,200, 324 Hesselink, F. T., 65 Heyes, D. M., 203 Hill, T. L., 63 Hiller, L. A., 62 Hirano, T., 363 Hirara, F., 326 Hirono, S., 70 Hirschfelder, J., 237 Ho, K. M., 238 Hoarau, J., 377 Hockney, R. W., 136 Hodes, R. S., 72 Hoffmann, R., 363 Hofmann, D. W. M., 360 Hogenson, G. J., 74 Hoggan, P. E., 380 Hohenberg, P., 201, 237 Holden, J. R., 360 Holian, B. L., 239 Hooft, R. W. W., 67 Hoover, W. G., 61,66, 136,203 Horsfield, A. I?, 238 Hospital, M., 364 Houk, K. N., 364 Howard, J. N., 204 Hshitsume, N., 64 Huang, K., 63,324 Hummer, G., 73,203
386 Author Index Hunenberger, P. H., 71, 324 Hunt, N. G., 364 Hunter, J. E., 111, 74 Huron, B., 378, 379 Hurst, G. J. B., 277 Huston, S. E., 68, 324 Hwang, J.-K., 69, 72 Hwang, M. J., 364 Ibers, J. A., 359 Ichimura, H., 64 Ichiye, T., 325 Impey, R. W., 72,199,324 Inoue, T., 276 Irikura, K. K., 65 Isaacson, L. M., 62 Ito, T., 204 Itoh, R., 277 Iwata, S., 276 Jackson, J. L., 64 Jacobsen, K. W., 239 Jacobson, J. D., 64 Jacucci, G., 67 Jancso, G., 72 Jang, S., 203 Janssen, L. H. M., 71 Jasien, P. G., 276 Jaszunski, M., 276 Jayaram, B., 202 Jensen, J. H., 205,278, 360 Jensen, L. H., 359 Jerkiewicz, G., 200 Jin, S., 199 Joannopoulos, J. D., 360 Johnson, B., 201 Johnson, B. G., 360 Johnson, K. W., 66 Jolles, G., 63 Jones, W., 365 Jones-Hertzog, D. K., 71 Jonsson, D., 277 Jordan, F., 379 Jrargensen, P., 277, 279 Jorgensen, W. L., 63,68, 69, 70, 71, 72,73, 199,324,325,326 Julg, A., 377, 378 Kahn, M., 64 Kaino, T., 275 Kaiwa, R., 364 Kalos, M. H., 64
Kanis, D. R., 274, 276,279 Karasawa, N., 361 Karen, V. L., 363 Karfunkel, H. R., 362, 363, 364, 365 Karirn, 0. A,, 67 Karna, S. P.,276,277,278 Karplus, M., 61,63,64,65,66,67,68, 70, 71,72,134,205,323,324,325,326 Kashino, S., 364 Kassner, D., 364 Kastler, D., 378 Kato, H., 362 Katz, H., 135 Kay, L. E., 61 Keith, T., 360 Kernball, C., 200 Kendrick, J., 361 Kennard, O., 360 Kersten, P., 275 King, G., 73 Kinoshita, M., 201 Kirkwood, J. G., 61 Kirtman, B., 199,278 Kitaigorodskii, A. I., 359 Kitao, A., 326 Kitchen, D. B., 72, 325 Kitson, D. H., 65 Klein, G. P., 200 Klein, L. S., 64 Klein, M., 203 Klein, M. L., 72, 135, 199,200, 203, 324 Kleinman, D. A., 275 Kloprnan, G., 360 Klug, D. D., 364 Kobayashi, H., 204 Kobayashi, N., 326 Koerber, S. C., 62, 65 Kohler, T., 238 Kohlrneyer, A., 202 Kohn, W., 201,237 Kolinski, A., 63 Kollman, P. A., 61, 63, 67, 68, 69, 70, 135, 136,323,325,326,361 Konishi, Y., 66 Korambath, P. K., 278 Koseki, S., 278, 360 Koster, G. F., 238 Kramer, M., 323 Krauss, M., 205 Kriebel, C., 66 Krogh-Jespersen, K., 324 Kroon, J., 67,361, 362,363
Author Index 387 Krynicki, K., 200 Kubo, R., 64 Kuczwra, K., 70 Kuharski, R. A., 72 Kumar, S., 67, 204 Kurnar, S. K., 72 Kurtz, H. A., 275, 278,279 Kusalik, P. G., 324 Kushick, J. N., 61,66, 324 Kutteh, R., 136, 202 Kuzyk, M. G., 276 Kvick, A., 359 Kwon, I., 238 Ladd, A. J. C., 136 Lahav, M., 359 Lambrakos, S. G., 136 Lanczos, C., 135 Lang, N. D., 201,239 Langlet, J., 379 Langley, D. R., 202 Langridge, R., 69 Larter, R., 64 Laurent, D., 380 Lavery, R., 63,378 Le Page, Y., 359,364 Leach, A. R., 364 Lebowitz, J. L., 200 Lee, A. M., 278 Lee, C. Y., 66,67,204 Lee, F. S., 70 Lee, H., 203 Lee, J., 63, 65 Lefebvre, R., 377, 378 Leiserowitz, L., 359 Lengauer, T., 360 Leslie, M., 361 Leusen, F. J. J., 361,362,363,365 Levesque, D., 66, 135 Levitt, M., 326 LCvy, B., 378 Levy, R. M., 66,72,324,325 Lewis, P. A., 365 Li, L., 72 Li, X. P., 238 Li, Z., 66 LiCata, V. J., 325 Liljefors, T., 360 Lim, D., 71 Linderberg, J., 277 Lindsay, G. A., 275 Linssen, A. B. M., 325
Lipkowitz, K. B., u, ui, uii, 63, 64, 134, 135, 201,202,205,236,237,277, 323,359, 360,363,364,377,378 Lipton, M., 364 Liu, H., 72 Liu, S.-Y., 275 Lobaugh, J., 200 Lomdahl, P. S., 239 Louer, D., 364 Lovell, R., 365 Luo, Y.,277, 279 Luque, F. J., 364 Luty, B. A., 326 Lybrand, T. P., 63,69, 134,202, 323, 363 MacCrackin, E L., 62 MacKerell, A. D., Jr., 70 Madey, T. E., 200 Madura, J. D., 199, 324 Maeda, H., 364 Maginn, S. J., 359, 365 Mahajan, S., 275 Main, P., 365 Malcolm, M. A., 323 Malik, D. J., 275 Malrieu, J. P., 378, 379 Manger, C. W., 365 Mansfield, M. L., 66 March, N. H., 201 Marchesi, M., 201 Marcus, R. A., 72, 73 Marder, S. R., 274,275, 276 Marichal, G., 134 Margenau, H., 275 Mark, A. E., 63, 70,71, 72,324, 325 Marks, T. J., 274, 276, 279 Maroulis, G., 276 Marrakchi, A., 275 Marrone, T. J., 68 Marshall, A. W., 64 Marsili, M., 360 Martin, R. L., 360 Martyna, G. J., 203,205 Mascarella, S. W., 237 Matouschek, A., 325 Matsunaga, N., 278, 360 Maye, P. V., 68, 72 Mayo, S. L., 364 Mayot, M., 377,380 Mazur, J., 62 McCammon, J. A., 61, 63,64, 67,68,69, 71, 72,134,135,204,323,324,326
388 Author Index McGrath, E., 135 McGuire, R. F., 64 McDonald, I. R., 61,66, 135 McDonald, N. A., 71 McWeeny, R., 276 Medina, C., 73,325 Mehrotra, P.K., 67 Meirovitch, E., 64, 65 Meirovitch, H., 62,64, 65, 66, 73 Melroy, 0. R., 204 Memon, M. K., 136 Mennucci, B., 279 Menon, M., 238 Meredith, G. R., 275 Mermin, D. N., 202 Mertz, J. E., 135 Merz, K. M., Jr., 70, 361 Methfessel, M., 237 Metropolis, N., 61,203, 363 Meyer, M., 203 Meyers, F., 276 Mezei, M., 63, 67, 68, 69, 70, 71, 72 Michael, A., 323 Mighell, A. D., 363 Mika, K., 364 Mikkelsen, K. V., 279 Mildvan, A. S., 61 Milgram, M., 237 Miller, J. S., 275 Miller, W. H., 237 Mitchell, M. J., 71 Mitra, S. K., 136 Miyamoto, S., 135 Moler, C. B., 323 Mornany, F. A., 64,326,360 Mon, K. K., 74 Montgomery, J. A., 278, 360 Montroll, E. W., 200 Mooij, W. T. M., 361, 363 Moran, B., 136 Moriguchi, I., 70 Morley, J. O., 276, 278 Morris, G. P., 136 Moser, C., 377, 378 Mostow, M., 62 Motakabbit, K., 201,202 Mougenout, P., 278 Mountain, R. D., 204,324 Mowbray, S. L., 73 Mruzik, M. R., 69 Miiller, A., 66 Muller, J., 359
Mulliken, R. S., 360 Murad, S., 135, 204 Murphy, G. M., 275 Nachbar, R. B., Jr., 323, 324 Nagi, G. K., 362 Nagumo, M., 136 Naider, F., 61 Nanayakkara, A., 360 Nazmutdinov, R. R., 199, 200, 204 Ndip, E. M. N., 362 Needs, R. J., 237 Nelson, J. S., 239 Nhe thy, G., 62, 64 Neuhaus, T., 62 Newton, C. G., 63 Nguyen, K. A., 278,360 Nguyen, T. B., 72 Nicholas, J. B., 136 Nicholas, J. D., 200 Nishikawa, T., 326 Noguti, T., 326 Noordik, J. H., 359, 361 Norman, P., 277 Norskov, J. K., 239 North, R. J., 365 Northrup, S. H., 67, 326 NosC, S., 203 Nunes, W., 238 O’Connell, T., 69, 72 Oddershede, J., 277 Odiot, S., 377 Ohm, N. Y., 277 Oikawa, S., 362 Okamoto, Y., 63, 74 Okazaki, K., 365 Olafson, B. D., 64, 323, 364 Olsen, J., 277 Ooi, T., 65 Oran, E. S., 136 Orban, J., 136 Orland, H., 62 Orlando, R., 360 Omstein, R. L., 325 Orozco, M., 364 Orr, B. J., 276 Ortiz, J. V., 360 Orville-Thomas, W. J., 134 Osguthorpe, D. J., 65 Ossicini, S., 202 Owicki, J. C., 69
Author Index 389 Padrnaja, N., 364 Padr6, J. A., 67 Paler, A. G., 61 Panagiotopoulos, A. Z., 72 Pangali, C., 67 Parkinson, W. A., 277 Parrinello, M., 237, 238 Parsonage, N. G., 64 Pastor, R. W., 203 Patey, G. N., 66,200,201,202,203,204 Paulsen, M. D., 325 Paulus, E. F., 365 Pavlider, P., 276 Payne, M. C., 238,360 Payne, P. W., 6 7 Payne, R. S., 362 Pear, M. R., 67, 134 Pearlman, D. A., 70, 135, 325 Pecina, O., 202, 204 Pedersen, L. G., 203 Pelletier, G. L., 362 Peng, C. Y., 360 Perahia, D., 66, 324,326 Perdew, J. P., 237 Perera, L., 203 Perez, S., 362 Perkyns, J. S., 71 Perlstein, J., 362 Perrin, E., 278 Persoons, A., 274 Peterson, G. A., 360 Pettersson, I., 360 Pettitt, B. M., 63, 323 Pettitt, M., 71 Philippe, M.-J., 359 Philpott, M. R., 201,204,205 Pierce, B. M., 274, 276 Piermarini, G. J., 365 Pincelli, U., 379 Pinches, M. R. S., 365 Plowman, R., 359 Pohorille, A., 204 Polatoglu, H. M., 237 Pontikis, V., 203 Pople, J. A,, 360 Popovitz-Biro, R., 359 Porezag, D., 238 Porter, J. D., 204 Postma, J. P. M., 69, 200, 324 Potts, G. D., 365 Pound, G. M., 69 Powles, J. G., 135
Prasad, P. N., 275, 278 Pratt, L. R., 204 Press, W. H., 203, 361 Price, D. L., 199, 201 Price, S. L., 361, 362 Price, S. P., 361 Prigogine, I., 275 Probst, M., 199 Prod’hom, B., 72 Profeta, S., Jr., 323 Pugh, D., 276,278 Pullman, A., 377, 379, 380 Pullman, B., 69, 377, 379 Purisima, E. O., 66 Purvis, G. D., 111, 275 Puska, M. J., 239 Quentrec, B., 135 Querol, E., 71 Quinn, J. E., 200 Quirke, N., 67 Rabin, H., 276 Rabitz, H., 323, 324 Rablen, P. R., 71 Raecker, T. J., 239 Raghavachari, K., 360 Raghavan, K., 201 Rahrnan, A., 134,237,324 Ramakumar, S., 364 Rancurel, P., 378 Rao, M., 67 Rappt, A. K., 360 Ratner, M. A., 274,276,279 Ravirnohan, C., 69 Ravishanker, G., 202 Read, A. J., 237 Rebertus, D. W., 134 Ree, F. H., 61 Regener, R., 275 Reinhardt, B. A., 275 Reinhardt, W. P., 73,74, 204 Renaud, M., 365 Replogle, E. S., 360 Resat, H., 68, 70, 71, 72 Rey, R., 6 7 Reynolds, C. A,, 361 Reynolds, J. C. L., 326 Ricci, M. A., 200 Rice, B. M., 361 Rice, J. E., 68, 274, 275,276, 277, 278 Rice, S. A., 275
390 Author Index Richards, W. G., 361 Richer, J., 204 Rick, S. W., 200 Rietveld, H. M., 364 Rihs, G., 362, 364 Rinaldi, D., 378, 379, 380 Rivail, J.-L., 378, 379 Rivier, J., 62, 65 Rizo, J., 65 Robb, M. A., 360 Roberts, R. J., 362 Roberts, V. A., 65 Robertson, D. H., 238 Robertson, I. J., 238 Robinson, G. W., 324 Rocklin, V., 203 Roetti, C., 360 Rohde, B., 362 Rohmer, M.-M., 379 Rojas, 0. L., 66 Rojnuckarin, A., 203 Roothaan, C. C. J., 275 Rose, D. A., 202,205 Rose, J. H., 238 Rosenberg, J. M., 67 Rosenbluth, A. W., 61,62,203, 363 Rosenbluth, M. N., 61, 62,203, 363 Ross, M., 66 Rossky, P. J., 67, 68, 204, 324 Rothman, L. S., 200 Rout, J. E., 359 Roux, B., 68, 325 Rowe, R. C., 362 Rubin, R. J., 62 Russell, S. T., 325 Rustad, J. R., 324 Ryckaert, J. P., 134, 135, 136,203 Sabin, J. R., 274 Sadlej, A. J., 277,278 St-Amant, A., 237 Salahub, D. R., 204, 278 Saleh, B. E. A., 275 Salem, L., 378 Salsburg, Z. W., 64 Salvador, R., 74 Salvetti, G., 200 Samuelsson, J.-E., 73, 325 Sander, C., 326 Sarko, A., 359 Sasagane, K., 277 Saunders, M., 364
Saunders, V. R., 360 Sawyer, D. W., 200 Scaringe, R. P., 362 Schaftenaar, G., 361 Scheek, R. M., 325 Scheffler, M., 360 Scheraga, H. A., 61, 62,63,64,65, 66, 69, 73, 74,135,359,361 Schlegel, H. B., 360 Schmickler, W., 199,201,202, 204 Schmidt, M. U., 363 Schmidt, M. W., 278,360 Schnabel, R. B., 362 Schneider, S. E., 72 Schoeffel, K., 361 Schoen, M., 202 Schon, J. C., 74 Schreiber, D. E., 69 Schreiber, H., 325 Schutt, C., 323,324 Schwalm, M., 201 Schweighofer, K. J., 202 Scott, H. L., 66 Scrivastava, D., 238 Seifert, G., 238 Sekino, H., 276,277,278 Sellers, H., 199, 205 Selmi, M., 71 Serrano, L., 325 Severance, D. L., 71 Sham, L. J., 201,237 Sham, Y. Y., 73 Shankar, S., 69 Sharon, R., 61 Shelley,J. C., 200, 203, 204 Shelton, D. P., 274, 275 Shen, J., 324 Shenderova, O., 237,239 Sherwood, P., 361 Shing, K. S., 64 Shoda, T., 362, 365 Shortle, D., 325 Showalter, K., 64 Siepmann, J. I., 200, 201 Silva, S. J., 199 Sim, E, 276, 278 Simonson, T., 70, 325, 326 Sinclair, J. E., 237 Singer, K., 61,66 Singer, K. D., 275 Singh, S., 324 Singh, U. C., 69, 135, 323,326
Author Index 391 Sinnott, S. B., 238, 239 Sinnreich, D., 364 Sippl, M. J., 64 Sklenat, H., 378 Skolnick, J., 63 Slater, J. C., 238 Smit, B., 66, 202 Smith, D. E., 67 Smith, E. R., 200 Smith, F. R., 325 Smith, J. A., 72 Smith, J. R., 238 Smith, P. E., 70, 71, 72 Smith, S. F., 68 Smith, S. J., 378 Smith, W., 136 Sneddon, S. F., 68 Sohn, J. E., 274, 275 Sommer, M. S., 70 Somorjai, R. L., 135 Soper, A. K., 200 Sorensen, L. B., 204 Sorescu, D. C., 361 Soriaga, M. P., 200 Soukoulis, C. M., 238 Spackman, M. A., 277 Speedy, R. J., 66 Spek, A. L., 363 Spohr, E., 199,200,201,202,203,204 Sprik, M., 72,200,201, 324 Squire, D. R., 61 Stanek, J., 364 Stanley, D. R., 359 Stanton, J. F., 237, 277 States, D. J., 64,323 Stefanov, B. B., 360 Stegun, I. A., 203 Steinhauser, O., 325 Steppe, K., 362 Stern, P. S., 61,326 Stevens, W. J., 205 Stewart, J. J. P., 275, 276, 360 Still, W. C., 364 Stillinger, F. H., 65, 134,200,237 Stivers, J. T., 61 Stackfisch, T. P., 364 Stout, G. H., 359 Straatsma, T. P., 63, 135, 136, 200, 202, 323 Struve, W. S., 365 Stuart, S. J, 200 Stucky, G. D., 274,275 Stumm, P., 238
Stumpf, R., 360 Su, S., 278, 360 Subbaswamy, K. R., 238 Subra, R., 379 Sudhakar, P. V.,199 Sukarai, T., 363 Susnow, R., 323,324 Sussman, F., 69 Sutcliffe, B. T., 378 Sutton, A. P., 238,239 Sutton, P. A., 365 Svishchev, 1. M., 324 Swaminathan, S., 64,69, 323 Swendsen, R. H., 67,74,204 Swope, W. C., 136 Szabo, A., 66,73 Szewczuk, Z., 66 Szleifer, I., 72 Tackx, P., 274 Tajima, T., 363 Talbot, J., 64 Tanabe, K., 363 Tanaka, T., 363 Tang, C. L., 276 Taylor, A., 359 Taylor, R. S., 238 Teich, M. C., 275 Teller, A. H., 61, 203, 363 Teller, E., 61,203, 363 Ternbe, B. L., 69 Teramae, S., 363 Tersoff, J., 237, 238 Tessier, C., 360 Teter, M. P., 360 Teukolsky, S. A., 203,361 Thacher, T., 323 Thakkar, A. J., 276 Thiel, P. A., 200 Thirumalai, D., 204 Thompson, D. L., 361 Tidor, B., 65,70 Tildesley, D. J., 61, 135,202,203, 324 Tirado-Rives, J., 63,70, 326 Tobias, B., 365 Tobias, D. J., 67, 68, 135, 136 Tomaru, S., 275 Tomasi, J., 279 Tornovick, R., 323 Toney, M. F., 204 Topley, B., 237 Topper, R. Q., 63
392 Author Index Torda, A. E., 323 Torrie, G. M., 61, 200, 204 Tossatti, E., 238 T6th, G., 199,202, 204 Trasatri, S., 199 Tropsha, A., 69, 71, 72 Troxler, L., 72,379 Trucks, G. W., 360 Truhlar, D. G., 63 Tse, J. S., 364 Tsuda, M., 362 Tsuda, Y., 66 Tsuzuki, S., 363 Tuckerman, M. E., 203,205 Uhlmann, S., 238 Uosaka, K., 200 Urabe, T., 362 Ursenbach, C. P., 199 Usui, T., 64 Vaday, S., 362 Vahtras, O., 277 Vaidehi, N., 205 Valleau, J. P., 61, 66, 74, 204 van Aalten, D. M. F., 325 van de Waal, B. W., 362 van der Haest, A. D., 361 van der Spoel, D., 325 van Eijck, B. P., 67,361, 362, 363 van Gunsteren, W. F., 63, 67, 68, 70, 71, 72, 134,200,323,324,325,363 van Helden, S. P., 71 van Lenthe, J-, 361 Van Nuland, N. A. J., 325 van Schaik, R. C., 70 Van Zandt, L. L., 136 Vanderbilt, D., 238 Visquez, M., 62,65,67,73 Veillard, A., 378 Veillard, H., 380 Velikson, B., 62 Verlet, I.., 61, 66, 136, 379 Verwer, P., 361 Vesely, F. J., 136 Vetterling, W. T., 203, 361 Vineyard, G. H., 237 Viswamitra, M. A., 364 Vogel, H. J., 325 Voter, A. F., 67 Voth, G. A., 199,200,203 Vukobratovic, M., 323
Wade, R. C., 326 Waicwright, T. E., 61 Wall, F. T., 62 Wallqvist, A., 69, 71, 72 Walter, R., 135 Wang, C. Z., 238 Wang, J., 66, 69, 71 Wang, L., 72 Wang, Y., 71 Ward, J. F., 276 Warshel, A., 63,68,69,70,72,73,205,325 Watanabe, M., 73, 74,200,204 Webb, S. P., 205 Weber, T. A., 65, 134,237 Weich, F., 238 Weiner, J. H., 134 Weiner, P., 63, 323 Weiner, S. J., 323 Weir, C. E., 365 Weissbuch, I., 359 Wells, R. D., 378 Werner, P.-E., 364 Wesolowski, 'I: A., 205 Westdahl, M., 364 Wheeler, D. J., 62 White, C. T., 238 Whitlock, P. A., 64 Whitman, C. P., 61 Whitten, J. L., 199 Wiberg, K. B., 361 Widder, D. V., 136 Widom, B., 64 Wieckowski, A., 200 Wiesler, D. G., 204 Wiest, R., 379 Wilkinson, A. J., 63 Willetts, A,, 275, 278 Williams, D. E., 326,359, 360, 361, 362, 363, 365 Williams, D. J., 275 Willock, D. J., 361 Wilson, E. B., Jc, 136 Wilson, K. R., 136 Wilson, M. A., 204 Wilton, R. W., 364 Windus, T. L., 278, 360 Windwer, S., 62 Winkelmann, J., 66 Wipff, G., 69, 72, 379 Witschel, W., 202 Wolff, J., 65
Author lndex 393 Wolynes, P. G., 324 Wong, C. F., 69,323, 324, 325, 326 Wong, M. W., 360 Wood, R. H., 71 Wood, W. W., 64 Woods, R. J., 363 Woon, D. E., 276 Wright, A. F., 239 WU,Y.-D., 364 Wu, Z. J., 364 Wynberg, H., 361 Xia, X., 202 Xu, C. H., 238 Xu, Y.,72 Yamahara, K., 365 Yamaotsu, N., 70 Yamato, T., 326 Yamazaki, M., 378 Yan, Y., 72 Yang, D., 61 Yao, S., 324 Yarwood, J., 134 Ye, X., 201 Yeates, A. T., 277,278, 279 Yee, D., 204 Young, M. A., 202 Youngs, W., 360
Yu, H.-a., 325 Yu, J., 278 Yu, W., 324 Yue, S.-Y., 66 Yun, R. H., 70,71 'fun-yu, S., 70 Yvon, J., 379 Zacharias, M., 135 Zakharov, I. I., 199 Zakrzewska, K., 378 Zakrzewski, V. G., 360 Zangwill, A., 201 Zaniewski, R., 360 Zerner, M. C., 274, 278 Zhang, H., 323 Zhang, J., 324 Zheng, C., 324 Zhidomirow, G. M., 199 Zhu, J., 324 Zhu, S.-b., 323, 324 Zhu, S.-B., 201 Zinn, A. S., 204 Zou, S. J., 239 Zugenmaier, P., 359 Zunger, A., 237 zuo, L., 359 Zwanzig, R. W., 61, 325 Zyss, J., 275
Reviews in Computational Chemistry, Volume12 Edited by Kenny B. Lipkowitz, Donald B. Boyd Copyright 0 1998 by Wiley-VCH, Inc.
Subject Index Computer programs are denoted in boldface; databases and journals are in italics. Ab initio calculations, 148,235, 344 Absolute entropy functional, 50 Absolute free energy of binding, 39 Acetarnide, 37 Acetaminophen, 352, 356,357 Acetic acid, 335,340, 348, 349, 352 Acetylene, 271 ACMM, 344 Adaptive umbrella sampling, 28 Adiabatic switching, 58 Aggregates, 341 Alanine, 32 Alanine dipeptide, 29,37 Alcohols, 335, 348 Alkanes, 75 Alloxan, 340 Alpha-helix, 30 Aluminum surface energy, 216 AMBER, 284 AMBER force field, 315 Amidinoindanone guanylhydrazone, 353 Analytical method of constraint dynamics, 80, 84, 89, 98, 101 Analytical potential energy function characteristics, 21 1 Angle-bend constraints, 82, 118, 121, 123, 130,133 Angular distributions, 182 Anharmonic effects, 290, 313 Antamanide, 29 Antibody, 39 Argon, 209 Aromatic hydrocarbons, 342 ASTERIX, 373 Asymmetric unit, 331, 348 Atomic multipoles, 335 Atomic orbitals (AO), 213,217,220, 260 Atomic charges, 190,292,293, 301, 319, 334,335,347,352,357,358 Atomistic simulations, 207 Avian pancreatic polypeptide (APP), 296, 321
Azurin, 36 Basis set dependence, 267 Basis sets, 210,216,265 3-21G, 342 6-31G, 265 6-31G*, 272,347 6-31G*", 347, 357 6-31G + PD, 265,267,272 6-31G(+sd+sp),265 6-311++G**, 266,267 aug-cc-pVDZ, 267, 272 aug-cc-pVTZ, 267,272 aug-cc-pVXZ, 266 CC-pVDZ,266,267 CC-PVTZ,266,267 CC-pVX Z, 266 d-aug-cc-pVDZ, 267, 272 d-aug-cc-pVTZ, 267, 272 double-zeta, 266 ELP, 265,267 POL, 266 POL+, 266,267,272 Sadlej, 267, 272 Spackman, 267 STO-3G, 357 t-aug-cc-pVDZ, 267, 272 t-aug-cc-pVTZ, 272 triple-zeta, 265,266 x-aug-cc-pVXZ, 266 Benzamidine, 32,40 Benzene, 132,331,340,342,352,357 Beta-sheets, 30 Bethe lattice, 229 Binding energy, 144,217,233 Bioactive molecules, 282 Biomolecular simulation, 281 Biomolecules, 313 Biphenyl, 254, 255 Boltzmann probability, 1,4, 7, 12, 15,20,41, 43, 50, 56
3 95
396 Subject Index Bond angle bending, 125,284,333 Bond angle constraint, 125 Bond distance, 21 1,217, 230 Bond order, 228 Bond stretching, 118, 125,284,333 Bond stretching constraints, 77, 80, 83, 91, 92,106,115,116,118,129,130 Boundary conditions, 29, 176 Bovine pancreatic trypsin inhibitor (BPTI), 21, 28,295,296 Brillouin theorem, 371 Brownian dynamics, 165, 297, 314 Bulk adsorption energy, 190 Bulk materials, 242 Bulk modulus, 217, 219,235 Bulk properties, 210 Bulk susceptibilities, 243 Bulk water, 141, 190, 195 Bulk water-semiconductor interface, 144 Butadiene, 273 Butane, 23, 79, 91 C,H,, 265 Caloric integration, 24 Cambridge Structural Database (CSD), 333, 340,345,347,348,353 Cancellation of errors, 318 Canonical (NVT)ensemble, 158, 163, 166 Carbon dioxide, 342 Carcinogenicity, 3 70 Car-Parrinello molecular dynamics, 144,210 Cartesian coordinates, 77,78, 84, 110, 155 CASTEP, 333,344 Cell parameters, 330, 338, 347 Centre de Mkcanique Ondulatoire Appliqute (CMOA), 369 Centre Europken de Calculs Atomiques et Molkculaires (CECAM), 374 Centre National de la Recherche Scientifique (CNRS), 368 CFF force field, 347, 357 Chain attrition problem, 44 Charge groups, 336 Charge models, 334 Charge sensitivities, 292, 295, 311 Charge transfer, 258 Charged metal surfaces, 195, 197 Charges, 190,292,293, 301, 319, 334,335, 347,352,357,358 CHARMM, 17,284 CHELP, 335 CHELPG, 335 Chemical and Engineering News, u, ix, xi
Chemical dynamics, 209 Chemical information retrieval, 374 Chemical literature, u Chemical potential, 175 Chemical reactor, 282 Chemical stability, 327 Chemisorption, 139 Chiral compounds, 348, 352 Chromophores, 244,309 Chymotrypsin, 32 CIPSI, 373 Close contacts, 329 Close-packed metals, 233 Clustering, 338 Coherent anti-Stokes Raman spectroscopy, 244 Cohesive energy, 217, 227,228, 235 Collective motions, 313 Complementary error function, 157 Computational chemistry, u, 241, 367 Computational Chemisty List (CCL), vi Computer packages, 255 Computer simulation, 1 , 137, 327 Computers, 374,377 Condensed phase systems, 273 Configuration, 18 Configuration interaction (CI), 213, 257 Configuration space, 1, 12, 144, 173 Conformation, 18, 335 Conformational analysis, 347 Conformational search, 58 Conjugate momenta, 309 Constrained coordinates, 98, 103 Constrained degrees of freedom, 90 Constraint correction, 83, 99, 132 Constraint dynamics, 78, 111, 116 Constraint equations, 84 Constraint forces, 103 Constraints, 161, 165, 319, 350 Construction probability, 41, 45 Conventional cell, 331 Converged properties, 265 Convergence, 38,47,336 Convergence criteria, 256 Convergence of sensitivity coefficients, 294 Convergence rate, 123 Conversion factors, 251 Cooperative effects, 283,291, 303 Cooperativity, 304, 306 Correlation energies, 213 Corrugated metal surfaces, 145, 146, 164, 176,177,180,193,194 Coulombic interactions, 155, 212
Subject index Coulomb’s law, 146,292, 310, 334 Coupled cluster methods, 248 Coupled-cluster equations-of-motion method, 264 Coupled holonomic constraints, 80, 89 Covariance matrices, 289, 312, 313 18-Crown-6, 307 CRYSCA, 345 Crystal density, 331 Crystal packing, 336, 337 Crystal polymorphs, 327 Crystal structure prediction, 328, 339, 347, 353 Crystal surfaces, 332 Crystal95,333, 344 Crystalline solids, 216 Crystallization, 327, 332, 339 Crystals, 251, 273, 330 Cutoff radius, 29, 154,300,336, 342,343, 346 Cytochrome P-450, 31 1 d orbitals, 226 D’Alembert’s principle, 95 DARC system, 374 Database of physical properties, 208 de Broglie wavelength, 167 Decaglycine, 23,48, 52 Defects, 138,210,218,223, 225, 232,234 Degenerate four-wave mixing, 246 Degrees of freedom, 76, 333 Deletions from an ensemble, 173 Density functional theory (DFT) calculations, 149, 150,208,209, 212,214,215,219, 220,232,333 Density matrices, 261 Density of states, 222,223 Density profiles, 166 Desktop mechanical calculator, 370 Detailed balance condition, 16, 56, 169, 173, 174 Diamond, 211,219,221,230,231 Diatomics, 216,217 Dibenzoylmethane, 329 1,2-Dichloroethane, 29 DICVOL91,347 Dielectric constant, 337 Diffuse functions, 265, 268 Diffusion coefficient, 187, 189 Diffusional-influenced reaction rates, 3 14 Digital computers, 367 Dimethoxyethane, 342 Dipeptides, 31 1
397
Dipole moment, 141, 149, 191,243,250, 254,256,257,266,292,294,336 Dipole moment matrix, 261 Direct methods for computing entropy, 19,49 Displacement correlation function, 288 Distributed multipoles, 344 Distribution function, 301 DMAREL, 344 DNA, 21,281 Docking, 334 Dodecane, 30 Domain of applicability, 123 Double excitation operators, 265 DREIDING force field, 355,357, 358 Drug design, 209 ECEPP, 17,21,48,52 EEM method, 90 Effective medium theory, 208,226,231 Effective pair potential, 228 Einstein harmonic oscillator formula, 21 Elastic constants, 211, 236 Electric field, 195, 196, 242,247, 254, 256, 259,301 Electrical response property calculations, 266 Electrode, 191 Electron correlation, 213,254, 272, 371 Electron density, 179,212,214, 232,334 Electron dispersion, 334 Electron gas, 148, 150, 213, 215, 232 Electronic energy, 214 Electrooptic modulation, 244 Electrostatic interactions, 321, 333, 336, 342, 344,346 Electrostatic potentials, 156, 192, 319 Electrostatically derived charges, 335, 347, 352,357,358 Embedded-atom method, 226,231,233,234 Empirical bond order model, 208, 226 Empirical potential energy functions, 1 7 Endothiapepsin, 310 Energies, 21 1 Energy fluctuation, 34 Energy minimization, 333, 337 Energy surface, 337 Enrichment method, 48 Ensemble averages, 5, 12 Ensembles, 159 Entropy, 1, 2, 5, 9, 19, 20, 22, 23, 24, 36,41, 55,58,300,303,306,331 Entropy functional, 50, 52 Enzyme-ligand binding, 32 Equations of motion, 82, 85, 89, 160, 162, 164
398 Subiect Index Equilibration, 165 Equivalent alternative constraints, 109, 133 Ergotic process, 15 Error function, 157 Error propagation, 314 Errors, 89, 100, 101, 130, 132,216 Essential dynamics approach, 312 Estrone, 350, 352,355 Ethane, 31,37 Ethylene, 273 Euler angles, 78, 151, 170, 173 Euler equations of rotational motion, 78 Ewald summation, 156, 336,342, 344, 346 Exchange-correlation functional, 213, 21 4, 215 Exchange-correlation potential, 214 Excitation energies, 257,265 Excited state, 247 Excluded volume, 42 Extended Hiickel theory (EHT), 218,219,343 Extensive variables, 6 False energy minima, 346 Fermi energy, 222 FHI96MD, 333,344 Finite difference methods, 160 Finite field method, 252,254 Finnis-Sinclair potential, 208, 220, 226, 229, 235 Fitting databases, 235 Flexcryst, 340,345 Flexibility in crystal structures, 211, 333 Fluctuations, 6, 12,23, 166 Fluids, 17, 138 p-Fluorobenzamidine, 32 Force fields, 2, 17,22,36, 37,210, 281, 292, 315,319,333,337,347,3SO Forces of constraint, 89, 102 Fractional coordinates, 330 France, 367 Free energy, 1,20,32,36,41, 59,295 Free energy difference calculations, 318 Free energy of binding, 32, 39,40 Free energy of solvation, 34,41 Free energy perturbation (FEP), 2,39, 40 Frequency doubling, 244 Frequency upconversion lasing, 246 Frequency-dependent properties, 256,257, 264 Friction term, 165 Friedel oscillations, 178 Fullerenes, 219
Galactose receptor, 40 GAMESS, 266,271,335 Gasteiger atomic charges, 341 Gaussian, 335 Gauss’s principle of least constraint, 77, 95 GEOMO, 374 Ghost molecule, 34 Gibbs free energy, 6,41,331,332 Global minimum, 338 Glucocorticoid, 358 Glue model, 226 Glycine, 32 Glycine dipeptide, 29,295,296 Glycol, 29 GMP, 36 Gonadotropin-releasing hormone (GnRH), 52 Grain boundary, 21 1 Gramicidin A, 30 Grand canonical ensemble, 158 Grand canonical MC, 29, 173 Graphene sheets, 231 Green’s function, 286, 287, 289, 290, 312, 313,314,322 GROMOS, 284,295 GROMOS force field, 284, 315 Ground state energy, 216 Hamiltonian, 10, 11,22, 26, 30, 33, 38, 58, 77,140, 147, 156, 212, 219, 221,248, 254,257,264,284,309 Hamilton’s principle, 95 Hardware companies, 374 Harmonic approximation, 20 Harmonic entropy, 20 Harmonic force constant, 300 Harris functional, 208, 215, 216, 220 Hartree potential, 212 Hartree-Fock calculations, 252, 255,333, 347 Hartree-Fock equation, 213,259 HCN, 266,267,268,270 Helix bundle, 30 Helix-coil transition, 21, 38, 56 Hellmann-Feynman theorem, 248 Helmholtz free energy, 2,5, 286, 31 1 Hessian, 20,287,289, 313 Hexatriene, 273 High pressure phases, 331 High pressure polymorph, 357 Histogram method, 59 Holonomic constraints, 75, 77, 82, 83, 106, 128,133
Subiect Index 399 Hiickel calculations, 372 Hund’s rule, 222 Hydration free energies, 3 11 Hydrazine, 371 Hydrocarbon interface, 191 Hydrogen bonds, 139, 166, 176, 186, 195, 300,333,340,351,357 Hydrogen fluoride, 371 Hydrophilic residue, 308 Hydrophobic contacts, 303, 308 Hydrophobic effects, 311, 321 Hyperpolarizability, 243,247,248, 249, 250, 252, 257,258,259,264,270,271, 273 Hypothetical scanning method, 17,49 Ice lattices, 195 ICES, 341,342,345 Ideal chain, 42 Image potentials, 146, 144, 148, 197 Imaging enhancements, 246 Importance sampling, 13,23,26,41,47,54, 167 Initial conditions, 159 Insertions in an ensemble, 173 Insulin, 21 Integration algorithm, 82, 84, 100, 132 Interatomic forces, 207 Interfaces, 137, 144, 207 Internal coordinate constraints, 75, 82, 110, 111,115,130 Internal coordinates, 77 Ionic solutions, 198 IR spectroscopy, 347 Ising model, 3,6,7,24,53,54, 59 Jellium model, 140, 143, 148, 152, 178,208, 232,234,235 Jellium potential, 150 Job cuts in industry, vizi Job opportunities for computational chemists, v, ix Kerr effect, 249,251 Kinetic energy operator, 212 Kirkwood equation, 10 Kirkwood factor, 294 Kleinman symmetry, 249 Kohn-Sham density functional theory, 149 Kohn-Sham orbitals, 214,220 Lagrangian dynamics, 77, 78
Lagrangian multipliers, 78, 81, 82, 85, 89, 98, 102,113,161 Lasers, 244, 246 Lattice chain models, 56 Lattice constant, 217,219,235 Lattice coordinates, 155 Lattice models, 42, 58, 226, 228 Lattices, 195, 335, 338 Lattice spin models, 59 Lattice symmetry, 330, 339, 341 Lattice vectors, 330 Law of cosines, 124 Layering, 196 Leapfrog Verlet algorithm, 132 Leibniz rule, 86, 87 Lennard-Jones constants, 10, 17 Lennard-Jones energy, 298 Lennard-Jones fluids, 25,31, 173 Lennard-Jones potential, 4, 144, 154, 192, 210 Leu-enkephalin, 21,23, 52, 56 Ligand, 39,334 Ligand binding, 30 Linear buildup procedures, 41 Linear constraints, 77 Linear response approximation (LRA), 39 Linear response theory, 310 Linear scaling algorithms, 218 Linearization, 104, 108 Liouville’s theorem, 58 Lipids, 281 Liquid crystals, 251 Liquids, 273 Local density approximation (LDA), 150, 215, 219,232 Local electronic bond energy, 227 Local energy minima, 337, 350 Local states (LS) method, 3, 17, 25, 51, 52 Localized microstates, 18 Lodge theory, 370 Low frequency modes, 76 Lysozyme, 39 M site, 142 Mach-Zender interferometer, 245 Magnetic susceptibility, 370 Many-body analytic potential energy function, 210 Marcus relationship, 39 Materials simulation, 207, 210 Matrix method, 83, 94, 103, 105, 111, 116, 118,120
400 Subject Index Matrix of constraint displacements, 113 Maxwell-Boltzmann distrihution, 159 Maxwell’s equations, 242 MDCP, 341, 342,345 Mean field approximation, 149 Melting points, 236 Mercury surfaces, 139, 176, 177, 181, 182, 186 Mercury-mercury potentials, 145 Mercury-water interface, 179, 185, 192 Metal clusters, 138, 139, 144, 197 Metal sxfaces, 137, 138, 140, 147, 148, 152, 186,196,231 Metals, 230, 231,233 Metal-water interfaces, 137, 143, 153, 175, 193,194 Metastable structures, 350 Met-enkephalin, 21, 56, 58 Methanol, 31, 311, 317 Method of strides, 47 Method of undetermined multipliers, 102 Method of undetermined parameters, 81,95, 101, 111,126 Methyl chloride, 29 9-Methyladenine, 30 4-Methylpyridine, 329 1-Methylthymine, 30 Metropolis algorithm, 343 Metropolis Monte Carlo, 1, 13, 15,52, 5 5 , 56,166,175 Microcanonical ensemble, 158 Minimal basis sets, 272 Minimization, 350 Minimizers, 337 Minimum free energy principle, 6, 7, 45, 54 MISSYM, 345 MM2,341 MNDO, 254,255 Mobility, 187 Modeling, 140 MOLDEN, 335 Molecular conformations, 345,347 Molecular design, 307 Molecular dynamics (MD), 1, 13, 15, 17, 41, 52, 75, 133, 140, 152, 159, 186, 196, 292, 298, 313, 322, 342, 346, 350, 373 Molecular dynamics trajectories, 289 Molecular electrostatic potential (MEP), 334 Molecular flexibility, 329, 344 Molecular mechanics (MM),286,333 Molecular orbital energies, 221
Molecular orbital (MO)basis, 262 Molecular packing analysis, 329 Molecular properties, 247 Molecular recognition, 306 Molecular vibrations, 110 Maller-Plesset (MP) perturbation theory, 264 MOLPAK, 341,342,345,346 Moments, 222,223 Moments theorem, 224 Monolayers, 144 Monosaccharides, 329,342, 353 Monte Carlo (MC)calculations, 1, 13, 15, 25, 31, 44, 55,56, 140, 152, 166, 168, 174, 187,196,292,341,343,349,373 MOPAC, 252,335 Morphology, 327 Morse function, 235 MPA, 341,342,345,346 MSHAKE, 132 MSI PP, 345 Mulliken charges, 334 Multicanonical algorithm, 3, 16, 56 Multicanonical probability, 56 Multiphoton pumping mechanisms, 246 Multipole expansion, 342 Multistage sampling method, 59 Myoglobin, 32 Naphthalene, 132 Native structure, 35 Newtonian mechanics, 78 NIST*LATTICE,344 p-Nitroaniline, 271 NMR, 347,353 Nonbonded cutoffs, 297, 303 Nonbonded interactions, 210 Nonbonded parameters, 307 Nonchiral compounds, 348 Nonholonomic constraints, 95 Nonlinear effects, 310 Nonlinear constraints, 107 Nonlinear optical (NLO) properties, 236, 241, 252,256,263 Nonlinear scaling relationships, 312 Nonlinearity, 99 Nonphysical transformations, 31, 32, 35 Non-self-consistent treatments, 149 Nonvariational methods, 248 Normal mode analysis, 20, 290, 3 13 NosC-Hoover thermostats, 163 Numerical breakdown, 255
Subject lndex 401 Numerical drift, 81 Numerical experiments, 367 Numerical integration, 89, 90, 160 NVT ensemble. 163 One-electron approximation, 212 OPLS force field, 315 Optical bistability, 245 Optical data storage, 246 Optical Kerr effect, 244 Optical rectification, 244 Optical signal processing, 245 Optical storage devices, 244 Order parameters, 182. 194 OREMWA method, 337 Orientational potentials, 144 Packing energy, 331 Pair-additive interactions, 210 Paraffins, 333 Parameter optimization, 319 Partial atomic charges, 190, 292, 301 Partially constrained coordinates, 97, 101 Partially rigid models, 79 Particle mesh Ewald, 158 Partition function, 4, 9, 16, 18, 48, 59, 167, 169 PCILO, 373 PCKS, 329 PCK83,344 PDM93,335 Peptide growth simulation, 38 Peptides, 56 Periodic boundary conditions, 154, 155 Periodic conditions, 335 Periodic interactions, 156 Perturbation potential, 287 Perturbation series expansion, 247 Perturbation theory, 248, 256,264, 309 Pharmaceutical industry, x, 327,375 Pharmaceutical Research and Manufacturers of America (PhRMA), uiii Phase conjugation, 246 Phase space, 5, 58, 77, 165, 309 Phase space volume, 58 Phonons, 145 Physical transformation, 40 Physisorption, 137, 139 Physisorption of water, 140, 143, 182 Pigment red, 353 pKa calculations, 31, 39, 309 Platinum-water potential, 146, 185
PLATON, 344 PGckels effect, 244 Point mutation, 307 Poisson-Boltzmann method, 29 Poisson’s equation, 152 Polarizability, 243, 247, 248, 252, 258, 266, 292 Polarizable water model, 31 8 Polarization, 147, 149, 191, 242, 292 Polarization propagators, 263 Polarized continuum model (PCM), 273 Polyacetylene, 273 Polyalanine, 38 Polyenes, 265, 271, 272 Polymer films, 273 Polymers, 2,42 Polymorph prediction process, 351 Polymorph Predictor, 338,343, 345,346, 355 Polymorphs, 327, 350 Polypeptides, 30, 36 Polysaccharide structures, 329 Positional fluctuations, 313 Potential drop at interfaces, 180, 190, 192 Potential energy function, 36, 139, 142, 146, 207,208,211,283,329,333 Potential energy function refinement, 31 8 Potential energy surface (PES), 2, 19, 209 Potential of mean force (PMF), 25,26, 58, 60, 76,198,297 Powder diffraction, 328, 332, 338, 347, 350, 352,353,354 Predictor-corrector SHAKE algorithm, 132 Prednisolone t-butylacetate, 352, 358 Primitive cell, 331, 352 Principal component analysis (PCA), 283, 290,312,316,317,320 Probability density, 1 PROMET3,340,345,346 Propagators, 263 Protein engineering, 307 Protein environment, 35 Protein folding, 18, 30, 42, 56, 58, 303, 307, 308 Proteins, 2, 17,22, 37, 39, 56, 75, 281 Pseudoenergy, 264 Pseudopotential, 210 Pyrimidine, 342 Quadrupole moments, 191 Quantum chemistry, 248,370 Quantum mechanical bonding, 208
402 Subject Index Quantum mechanical calculations, 144,212, 265 Quantum mechanical entropy, 21 Quantum mechanics (QM), 333, 368 Quantum Monte Carlo calculations, 150 Quasi-harmonic approximation, 22, 312, 313 Quaternions, 78, 126, 173 Quinacridone, 352, 358 Racemates, 348 Radial distribution function, 26, 141, 143, 166,180,185,300 Radius of gyration, 30 Random coil, 51 Random number generator, 174 Random phase approximation (RPA) methods, 261,263 Random walk, 2,42 Ras protein, 52 RATTLE, 82,83,128,129,132,133 Reaction coordinates, 27,30, 76 Reaction field model, 273 Receptor, 334 Reciprocal space, 336 Reduced cell, 331, 338, 344 Redundancy of constraints, 79, 109 Reference state, 39 Refractive index, 243,245, 246 Research and development (R&D) expenditures, viii Response functions, 264 Restraint potentials, 36, 38 Rietveld method, 347 Rigid body minimization, 337, 342 Rigid body translation, 288 Rigid bodies, 333,343, 353 Rigid models, 78 Rigid water model, 116, 126 RNase T, 36 Rotamers, 352 Rotation matrix, 170 Runge-Kutta integration algorithm, 91 Sampling theory, 12 Scanning method, 3, 7,44,46, 51 Scanning probe microscope (SPM), 138,197 Scanning transition probabilities, 49 Schrodinger equation, 212 Scoring functions, 334,340 SCRIPT, 375 Second harmonic generation (SHG), 243,244, 245,251
Second-moment approximation, 224,225 Self-avoiding walks (SAWS),42, 50 Self-consistent field (SCF), 260 Self-intersecting walks, 42, 50 Semiempirical molecular orbital approximations, 218, 271 Sensitivity analysis, 281, 290, 31 1 Sensitivity coefficients, 283, 284, 285, 307, 319 Sensitivity matrix, 287, 291, 316 Serine dipeptide, 315 SETTLE method, 82,83,132 SHAKE method, 82, 83, 106, 108, 110, 111, 115, 116, 117, 118, 119, 120, 121, 122, 123,124,128,129,132,133 Shear constants, 233 SIBFA force field, 374 Silver-water interfaces, 185 Simple fluids, 18 Simple sampling, 14,42, 44 Simulated annealing, 337 Simulation box, 155 Singular value decomposition (SVD), 283, 290,316 Slater determinant, 213 Slow growth thermodynamic integration, 29, 34,37 Smooth surfaces, 176 Smooth truncation, 156 S,2 reaction, 29 Sodium chloride (NaCI),29 Software, xiii, 225, 373 Solutes, 31 1 Solvation free energy, 31 Solvent effects, 289, 314 Solvent-accessible surface area (SASA),40 Solvent-free polymorphs, 355 Solvochromatic method, 258 Somatostatin, 23 Space group constraints, 346 Space group symmetry, 331,340,342,344, 348,350 SPC water model, 141, 163,292,298,299 SPClE water model, 123, 124,126,141, 163 Specific heat, 6, 9, 11, 24 Spectral density, 189 Square lattice, 42,48, 53, 225, 303, 308 State function, 32 Statistical mechanics, 4, 209, 373 Step-by-step buildup, 3 Sterically accessible regions, 192
Subject Index 403 Stochastic models method, 7, 53 Stochastic process, 15 Stormer algorithm, 101 Structural diversity, 339 Structural properties, 180 Structural response, 313, 314 Structure-binding relations, 209 Sum-over-states (SOS) methods, 252, 256, 263 Supercell, 350 Supercomputers, 375 Superlattice, 350 Surface area, 40, 3 1 1 Surface corrugation, 144, 197 Surface effects, 340 Surface polarization, 139 Surface properties, 210, 223 Susceptibilities, 242, 248 Symmetry, 331, 340,342, 344,348,350 Systematic search, 346 Target state, 39 Tautomeric forms, 355 Taylor series, 95, 100, 104, 107, 114, 243, 247 Teaching computational chemistry, 375 Temperature constraint, 95 Theoretical biochemistry, 369, 372 Theoretical chemistry, 368 Thermal expansion, 236 Thermodynamic cycle, 33,34 Thermodynamic integration, 2, 9, 11, 24, 31, 33, 36, 37, 76 Thermodynamic perturbation, 36, 76 Thermodynamic properties, 286 Thermodynamics, 331 Third harmonic generation, 244 Three-dimensional grid, 340 Threonine, 335 Threonine dipeptide, 315 Tight binding method, 208, 218 Time scales, 75, 175, 176, 186 Time steps, 90, 133, 161, 162, 198 Time-dependent Hartree-Fock method, 258 Time-dependent response functions, 263 Time-dependent Schrodinger equation, 259 Tinfoil boundary conditions, 337 TIP3P, 292 TIP4P, 141,163 TIP-4FP, 141,163 Torsional constraints, 120, 130, 131 Torsional interactions, 333
Trajectory, 101,209 Transferability, 21 1 Transformation paths, 37 Transition moments, 257, 258 Transition probabilities, 2,41,49, 51, 56 TREOR90,347 Trial crystal packings, 342 Trial structures, 340, 345 Triangulation procedure, 82, 109, 123 Tribochemistry, 231 Triphenylphosphine, 329 Triphenylverdazyl, 329 Trypsin, 32, 40 Tryptophan, 309 Two-dimensional bias function, 29 Two-electron integral calculations, 255 Two-photon upconverted emission, 247 Umbrella sampling, 14, 16, 23, 24, 25, 27, 30 Unconstrained coordinates, 102, 103 Undetermined parameters, 81, 82, 98, 100, 111 Unit cells, 329, 330, 331,344 United atom model, 92, 342 Unrestricted Hartree-Fock (UHF) computations, 370 UPACK, 341,342,345,346 Valence bond calculation, 371 Valence electrons, 148, 197 van der Waals energy, 336 van der Waals interactions, 333, 346 van der Waals surface, 3 19 Variational principle, 213,233, 259 Velocity autocorrelation function, 188 Velocity Verlet integration algorithm, 83, 126 Verlet integration algorithm, 83, 101, 102, 111,126,127,160,164 Vibrational calculations, 273 Vibrational dynamics, 110, 113 Vibrational frequency, 217 Vibrational modes, 141 Visualization, 176 Volume, 40 Water, 17, 27,29, 31, 80, 109, 139, 141, 142, 144, 148, 159, 160, 161,162, 166, 173, 177, 181, 183, 184, 190, 193, 293, 298, 301,318 Water density profile, 194 Water models, 292
404 Subject Index Water-metal potentials, 144, 148 Water-water interactions, 37, 194 Wave equation, 242 Wavefunction, 212,213 Weak coupling, 76 Wide microstates, 18 Wigner-Seitz radius, 148 Wilson G matrix, 113 Wilsonvectors, 110, 112, 118, 119 Windows, 24
WMIN, 329,342 Work function, 149 World Wide Web, u, xiii Xenon, 32 X-ray powder diffraction, 328, 332, 338, 347, 350,352,353,354 X-ray scattering, 194 Zwanzig equation, 10