PROTEIN
A Comprehensive Treatise Volume 2 •
1999
This Page Intentionally Left Blank
PROTEIN
A Comprehensive Treatise
Editor:
GEOFFREY ALLEN London, England
VOLUME 2 • 1999
(jEn) J AI P R E S S I N C
Stamford, Connecticut
Copyright © 1999 by JAI PRESS INC. 100 Prospect Street Stamford, Connecticut 06904 All rights reserved. No part of this publication may be reproduced, stored on a retrieval system, or transmitted in any way, or by any means, electronic, mechanical, photocopying, recording, filming, or otherwise without prior permission in writing from the publisher. ISBN:
1-55938-672-x
Manufactured
in the United States of America
CONTENTS
List of Contributors Preface Geoffrey Allen Chapter 1 Protein Crystallography Anirhuddha Achari and David K. Stammers Chapter 2 The Chemistry of Protein Functional Groups Gary E. Means, Hao Zhang, and Min Le Chapter 3 Electrostatic Effects in Proteins: Experimental and Computational Approaches Norma M. Allewell, Himanshu Oberoi, Meena Hariharan, and VinceJ. LiCata Chapter 4 The Binding of Ions to Proteins Jenny P. Glusker Chapter 5 Protein Folding Franz X. Schmid Chapter 6 Thermodynamics of Protein Folding and Stability Alan Cooper
v
vi Chapter 7 Protein Hydrodynamics Stephen E. Harding INDEX
CONTENTS
27) 307
LIST OF CONTRIBUTORS
Anirhuddha
Achari
Glaxo Wellcome Medicines Research Centre Stevenage, Herts, England
Norma M.
Allewell
Department of Biochemistry University of Minnesota St. Paul, Minnesota
Alan Cooper
Chemistry Department Glasgow University Glasgow, Scotland
Jenny P. Glusker
Institute for Cancer Research The Fox Chase Cancer Center Philadelphia, Pennsylvania
Stephen E. Harding
School of Biology University oi Nottingham Sutton Bonington, England
Meena
Department of Biochemistry
Hariharan
University of Minnesota St. Paul, Minnesota Gary E. Means
Department of Biochemistry The Ohio State University Columbus, Ohio
Min Le
Department of Biochemistry The Ohio State University Columbus, Ohio
VinceJ. LiCata
Department of Biochemistry University of Minnesota St. Paul, Minnesota
vii
LIST OF CONTRIBUTORS
VIII
Himanshu
Oberoi
Department of Biochemistry University of Minnesota St. Paul, Minnesota
Franz X. Schmid
Biochemisches Laboratorium Universitat Bayreuth Bayreuth, Germany
David K. Stammers
Laboratory of Molecular Biophysics University of Oxford Oxford, England
Hao Zhang
Department of Biochemistry The Ohio State University Columbus, Ohio
PREFACE
In Volume 1 of this series, the structures of protein molecules were described, together with computational methods linking sequence data to folded structure and function. The determination of protein structure by nuclear magnetic resonance spectroscopy was also presented. The current volume begins by continuing the theme of protein structure with an outline of methods of crystallographic structure determination. Subsequent chapters describe various structure-related properties of proteins. The chemistry of protein functional groups, with emphasis of reagents used to chemically modify proteins, is covered in Chapter 2. Complementary chapters on electrostatic effects in proteins and on the binding of ions to proteins follow. The topic of protein folding is also described in two chapters, one on pathways of folding and the other on thermodynamics of protein folding and stability, areas of significant recent advances in understanding. Finally, the hydrodynamic properties of proteins, reflecting primarily their molecular size and shape, are covered in Chapter 7. I thank the authors for their contributions, which should be valuable to those new to the field of protein science as well as to those already expert in various aspects of the field. Geoffrey Allen Editor ix
This Page Intentionally Left Blank
Chapter 1
Protein Crystallography ANIRHUDDHA ACHARI and DAVID K. STAMMERS
2 2 3 3 4 6 7 8 8 9 10 10 11 12 14 16 16 16 17 19
Abstract Introduction Protein Crystallization The Crystallization Process Factors Affecting Crystal Growth Crystallization Methods Diffraction of X-rays Crystallographic Data Collection Synchrotron Sources Detection of Diffracted X-rays Data Reduction and Processing Methods of Phase Determination Molecular Replacement Multiple Isomorphous Replacement Anomalous Dispersion Map Improvements Density Modification Maximum Entropy Techniques Structure Refinement Final Model and Validity of the Structure
Protein: A Comprehensive Treatise Volume 2, pages 1-22 Copyright © 1999 by JAI Press Inc. All rights of reproduction in any form reserved. ISBN: 1-55938-672-X
1
2
ANIRHUDDHA ACHARI and DAVID K. STAMMERS
ABSTRACT Almost all three-dimensional protein structures known were determined by means of X-ray crystallography. Protein crystals of high quality are required, and various methods are available for obtaining these, including such procedures as partial proteolysis, addition of antibody Fab fragments, and protein engineering. A brief outline of the X-ray diffraction process, including sources and detection of X-rays, is presented. A major hurdle to identifying the structure from the measured intensities is the determination of the associated phases. Multiple isomorphous replacement using heavy atoms remains an important technique. An array of computer software is available for refining structures, including molecular dynamics methods.
INTRODUCTION Knowledge of the three-dimensional structure of proteins is now routinely used for the understanding of the functional properties of such macromolecules. It is now considered an essential framework on which to bring together and rationalize diverse biochemical and genetic data. In addition, knowledge of the binding sites of ligands to macromolecules can be used in the design of novel inhibitors with potential for development as drugs. Such advances in our knowledge have been the result largely of technological developments. Firstly, the techniques of recombinant DNA and heterologous expression of proteins have made available in sufficient quantities a vast array of proteins that were previously impractical to purify from natural sources of cells or tissues. Technological developments in the production of X-rays and highly sensitive area detectors have been of key importance. This coupled with computer hardware and software capable of structure refinement as well as computer graphics for model building have been important factors in giving rise to rapid structure determinations of an increasing number of complex proteins and macromolecular assemblies. There is currently an exponential growth in the reporting of structures in the Brookhaven Protein Databank. The vast majority of the three-dimensional protein structures determined to date have been by the use of X-ray crystallography. This remains the most general method for three-dimensional structure determination of proteins, as it is applicable to proteins of molecular weights greater than 700 kDA. By contrast, NMR methods, in spite of the significant developments in multidimensional methodology and high field spectrometers, are currently limited to de novo structure determination of proteins with an upper molecular weight limit of 25 kDA. One prerequisite for X-ray crystallographic structure determination of proteins is the growth of suitable crystals. Crystals must be of suitable size and internal order to enable the recording of high-resolution X-ray data. This is in some cases a nontrivial problem and can require particular attention to the purity of the protein preparation and the setting up of numerous crystallization trials. The crystallization
Protein Crystallography
3
stage can still represent the rate-limiting step in the structural analysis for some proteins. We describe in this chapter some of the methods used in the crystallographic determination of proteins by X-ray diffraction including the obtaining of suitable protein crystals for the analysis.
PROTEIN CRYSTALLIZATION In contrast to many of the stages in structural analysis of a protein such as X-ray data collection, calculation of electron density maps, and refinement of the protein model, the crystallization of proteins is the least well understood part of the whole structure determination process. Thus developments in this field are largely the result of empirical knowledge. Such experimentally derived methodology has expanded greatly from the early years of the subject. The relative lack of understanding of protein crystallization is the result of the complex nature of proteins that, as large polyelectrolytes of low symmetry, have properties that vary with a wide range of factors such as pH, temperature, and ionic strength among others. A selection of review articles and practical guides to protein crystallization are available and should be consulted for more details including experimental protocols (McPherson, 1982, 1990; Carter, 1990; Ducruix and Giege, 1992). In this chapter, we give a brief overview of the crystallization process together with an update on some of the current developments in this field. The Crystallization Process
In common with the crystallization of small molecules, the crystallization of proteins is achieved by producing a supersaturated solution. This is a metastable state that is thermodynamically unstable and achieves equilibrium by either forming precipitate or crystals. Crystallization is characterized by three stages: initially there is a nucleation stage, which is then followed by a growth phase and finally the cessation of growth. Spontaneous nucleation consists of the formation of a stable aggregate that then provides surfaces suitable for the growth of a crystal. Crystal growth is halted when either the protein concentration is lowered as a result of the crystallization process or there is deformation of the lattice or the presence of impurities blocks the growing crystal faces (Weber, 1991). The crystallization process is entropically unfavorable. This is as a result of the loss of translational and rotational degrees of freedom of the molecules as they are packed into a crystal lattice. In the case of proteins, there is as well a constraining of surface loops within the crystal. To counterbalance this there has to be a favorable gain in enthalpy to give an overall free energy change that can drive the crystallization process. This enthalpic gain is derived by addition of an agent such as salt which competes with the water that solvates the exposed amino acid side chains on the protein surface. This results in a desolvation effect that leads to favorable
A N I R H U D D H A ACHARI and DAVID K. STAMMERS
4
interactions with neighboring molecules and hence crystallization (Weber, 1991). Essentially there are three types of agent that can compete with bulk solvent and thereby induce proteins to crystalize. These are salts such as ammonium sulphate or sodium phosphate, organic solvents such as ethanol, methylpentanediol (MPD) or isopropanol, or thirdly, long-chain polymers such as polyethylene glycol. In addition to their use as single agents, these precipitants have also been used in various combinations. The available scientific literature on conditions for protein crystallization has been collated in the Biological Macromolecule Crystallization Database [BMCD] (Gilliland and Bickham, 1990). Analysis of this indicates that ammonium sulphate is the most commonly used precipitant for crystallization of proteins followed by PEG 6000 and MPD. Factors Affecting Crystal Growth
There are many factors that are known to affect the growth of protein crystals. McPherson (1990) has listed 26 factors that are considered to be important in his experience. Some distinction can be made between extrinsic factors on the one hand and variants within the protein itself on the other. Some of these two classes of variants are listed in Table 1 and discussed below. Extrinsic Factors
The variation of pH can affect the ionization of certain amino acid side chains and hence their interactions with neighboring molecules and in turn their crystallization properties. Variation in pH of less than 0.5 of a pH unit can affect crystal growth. Temperature can affect both the solubility of a protein as well as its stability. Generally protein crystals are obtained either at 4 °C or close to room temperature (-22 °C). Ionic strength is a key factor in promoting crystallization from high salt
Table 1. Some Factors Important in Crystallizing Proteins Extrinsic Factors
Intrinsic Factors
pH, Temperature Ionic strength Protein concentration Precipitant type and concentration Crystallization method Metal ions, Detergents Ligands, Fabs Seeding Microgravity
Purity Different species Protease digestion Truncated forms Single point mutations Aggregation state Glycosylation state
Protein Crystallography
5
conditions. Some proteins have a solubility minimum at low ionic strength and hence can be crystallized by dialyzing against low salt. The actual crystallization method can be of importance as to whether crystals appear. The methods used can be classified into three common types: batch, vapor diffusion, and equilibrium dialysis. These are described in more detail below. Additions of metal ions that are not necessarily of functional or structural importance in the native form of the protein have important roles in bridging molecules in the crystal lattice. The use of nonionic detergents, particularly noctyl-glucoside, was first developed as a method for the crystallization of membrane proteins but has since been shown to be of benefit in the case of soluble proteins (McPherson, 1990). It is thought that the detergent might be reducing nonspecific hydrophobic interactions. A similar effect might be the result of the addition of a few percent of organic solvents such as DMF or ethanol. (Miller et al., 1989). It has long been observed that the presence of a protein ligand such as a substrate or inhibitor can have dramatic effects on crystallization. This can be the result of a conformational change in the protein or a general tightening up of the structure to give a more rigid molecule. This is seen in the generally observed greater resistance to proteolysis of ligand-bound forms compared to their apoprotein equivalents. The addition of a Fab fragment of an antibody has been found useful in the crystallization of proteins such as neuraminadase, (Laver, 1990). It could be the result of stabilizing an outside loop or just the additional surface of the Fab of an antibody providing an extra region on which crystal contacts can be formed. The use of seeding methods can be crucial in obtaining large crystals suitable for a structural analysis. Two basic methods are employed: these are the macroscopic method where small crystals are washed prior to introduction into a new supersaturated protein solution. The second method is the use of microscopic seeding where a crystal is crushed, the solution diluted, and a small amount introduced into a supersaturated protein solution. An alternative variant on this is the "streak" seeding method using a cat's whisker (Stura and Wilson, 1990). Methods for the crystallization of proteins under conditions of microgravity in space have been developed in the last 10 years and for two proteins improvements in diffraction or crystals with better morphology have been observed (DeLucas et al., 1989). Variants within the Protein
In the category of intrinsic variants, the importance of high purity protein cannot be overstressed. Early work on proteins prior to the availability of recombinant DNA methods emphasized the importance of preparing a protein of interest from different species as the variations in surface residues present can be of key importance in producing usable crystals. Limited protease digestion can be used to clip off "floppy" regions of the protein giving rise to a more rigid core domain. Once the points of proteolytic cleavage have been identified, then recombinant
6
ANIRHUDDHA ACHARI and DAVID K. STAMMERS
DNA methods can be applied to express large quantities of these domains for functional characterization, crystallization trials, and structure determination. An exciting development in the field is the use of protein engineering to modify a protein so as to improve its crystallization properties. This is illustrated in the elegant work on HIV integrase (Dyda et al., 1994). This protein had previously resisted all attempts at crystallization, largely as a result of its tendency to aggregate. Putative surface regions of the protein that might give rise to hydrophobic interactions were identified. Site-directed mutants that contained either alanine or lysine replacing hydrophobic residues were constructed. Screening of a variety of singlepoint mutants revealed that the mutant with tyrosine 185 changed to lysine demonstrated an increase in the solubility of the expressed protein. This mutated form of the catalytic core domain of HIV integrase was found to crystallize easily and in turn lead to the rapid determination of the structure (Dyda et al., 1994). The presence of posttranslational modifications of proteins expressed in eukaryotic systems can be a source of heterogeneity giving rise to the poorly ordered crystals. This can be overcome in some cases by inhibiting the modification completely within an expression system as, for example, in inhibiting the glycosylation of CD2 (Davis et al., 1993 ) thereby giving a more homogeneous preparation. The aggregation state of the protein preparation can be of crucial importance in determining whether a protein can be crystallized. Aggregation can be the result of the method of purification or some intrinsic property of the protein. The method of dynamic light scattering has been successfully applied to screen protein preparations for aggregation or polydispersity. A good correlation has been observed between monodisperse protein preparations and their ability to crystallize (Zaluaf andD'Arcy, 1992). Crystallization Methods
The earliest method used for crystallizing proteins was the batch method. This has the disadvantage of being relatively expensive in terms of material. Various methods of equilibrium dialysis have been developed (Zeppezauer, 1971), but by far the most common method now used is that of vapor diffusion (McPherson, 1982). With this, usually equal volumes of protein (generally in the range of 5-30 mg/ml) and the precipitant are mixed. In the case of "hanging drop" vapor diffusion, the protein/precipitant are placed on a siliconized cover slip that is inverted and sealed with vacuum grease over a reservoir of precipitant solution. Vapor equilibration occurs between the droplet and reservoir giving rise to supersaturation and hopefully crystal growth. An alternative is the use of "sitting-drops" in which the protein is placed on a bridge with the reservoir below. For the crystallization of certain proteins, it might be necessary for many thousands of crystallization conditions to be surveyed. To reduce the amount of manual labor involved and improve precision, a number of automated systems have been developed. These range from modified pipeting stations (Cox and Weber,
Protein Crystallography
7
1987) to fully automated robots including video camera monitoring of crystal growth and associated database record keeping (Jones et al., 1987). In deciding on the best use of often limited quantities of material, methods for statistical analysis to optimally sample the multidimensional space of the crystallization variables have been devised (Carter and Carter, 1979). An alternative approach is to produce standard sets of conditions, usually about 50, based on the most commonly used crystallization conditions (Jancarik and Kim, 1991). This sparse-matrix approach has proved extremely successful in crystallizing a wide variety of proteins. As the reagents are commercially available, it is the simplest first step in attempting to crystallize a protein. Various other sets of standard conditions are available and many laboratories create their own, based on experience with particular proteins.
DIFFRACTION OF X-RAYS A crystal can be considered as a diffraction grating made up of regular, repeating molecules or atoms known as a unit cell. A unit cell is defined by three axes denoted a, b, and c and three interaxial angles, a between b and c; P between a and c ; y between a and b. Planes of atoms in a crystal are assigned indices known as Miller indices, which are reciprocals of intercepts of that plane on a, b, and c, the axes of the unit cell. Thus a plane parallel to a and b will have indices 001 and a plane with indices 235 means that intercepts are 1/2, 1/3, 1/5 on a, b, and c axes respectively. Seven crystal classes are defined. 1. 2. 3. 4. 5. 6.
Triclinic Monoclinic Orthorhombic Tetragonal Hexagonal or Trigonal
7. Cubic
no restrictions on a, b, or c and a, P or y P * 90°, a = y = 90°; no restriction for a,b,c a = P = y = 90°; no restriction for a,b,c a = P = y = 90°; a=b, c any dimension a = P = 90°, y = 120°; a=b, c any dimension a = P = 90°, y = 120°; a=b, c any dimension or a = P = Y*90°; a=b=c a = P = Y = 90°; a=b=c
Rotational symmetry in a crystal can be twofold, threefold, fourfold, or sixfold or a combination depending on the crystal class. (An rc-fold rotational symmetry means a pair of objects in an unit cell are related by a rotation of 360/n degree.) Screw axes are combinations of a rotation followed by a specified translation along that axis; for example, a twofold screw along b means a 180-degree rotation around b, followed by a b/2 translation along b. Rotational symmetry defines the point group of a crystal and a combination of rotational and translational symmetry assign it to its Space Group. An asymmetric unit of a unit cell is related to the other parts of the lattice by rotation and translation.
8
ANIRHUDDHA ACHARI and DAVID K. STAMMERS
Crystals act as a diffraction grating to an incident X-ray beam, and the relationship between the spacing of rows of planes of atoms J, the wavelength X, and the angle at which the emergent ray is observed is given by the equation known as Bragg's law: 2ds'mQ = nX
(1)
(JAz)sine = X/2
(2)
where n is the order of diffraction.
One can consider a reflection at an angle 9 either as first order from planes of spacing din, or as nth order from planes of spacing d. It is more convenient and practical to deal with only one order of reflections from planes of different spacing. The smallest spacing that will give a first order reflection (n = 1) is d = A/2, the limit of resolution one can get from single crystal diffraction experiments with an X-ray source of wavelength X.
CRYSTALLOGRAPHIC DATA COLLECTION Prior to the recent development of methods for cryocooling protein crystals in liquid nitrogen (Rodgers, 1994), a common feature of data collection strategies was a protein crystal mounted in a sealed glass or quartz capillary with a drop of mother liquor at one end. This is to maintain the protein crystal in a hydrated environment in which the crystals were grown. For flash freezing, the crystal is equilibrated with a cryoprotectant solution of glycerol or polyethylene glycol and is captured in a fiber loop and then rapidly frozen in a nitrogen gas cold stream (such a device can be purchased from Oxford Cryo Systems or made in-house.) The crystal is then mounted on an X-ray camera, optically aligned and centered. A beam of X-rays, either monochromated or reflected from mirrors, is then shone on the crystal and diffracted X-rays are collected by a detector. X-rays are generated in laboratories by the impact of electrons on a target (usually copper or molybdenum). The target emits X-rays as a result of an excited electron returning to K-shell from L-shell. Copper K a (X = 1.54A), E = 8 Kev is the choice for protein crystals in laboratories. In rotating anode mode of X-ray generation, the copper anode is rotated and water cooled, which allows for higher loading and a stronger X-ray beam compared to fixed anodes. Synchrotron Sources
In the late 1970's, the availability of X-rays from a synchrotron at Daresbury U.K. opened up a new dimension for macromolecular crystallographic experiments. In storage rings or synchrotrons, electrons or positrons move at relativistic velocities in a closed loop. Acceleration or deceleration of particles confined by magnets and moving at velocities close to the velocity of light emits X-rays that are
Protein Crystallography
9
captured by beam lines tangential to the storage rings. The wavelength of the emitted X-ray beam is given by X= 0.559R/E3, where R is the radius of the storage ring and E the energy of the particles. Three advantages of X-rays from synchrotrons are (a) extremely high intensity, (b) a cleaner beam of low divergence, and (c) tunability of the wavelength. Extreme care in collimation and focusing of the intense beam can generate data of high quality obtainable at a short exposure time and higher signal-to-noise ratio. Tunability and highly monochromatic X-rays offer the experimenter the ability to collect data at or near the absorption edge of the metal of a metalloprotein or a heavy atom derivative to collect high-quality anomalous data (Harada et al., 1986), (see the section on phase problems, p. 17). Another experimental technique that intense X-rays from synchrotron offers is to collect data with X-rays containing a broad range of wavelengths, known as white radiation (Laue method). This method allows the complete data to be collected in a few milliseconds and can be used to do time-resolved crystallographic snapshots of an enzyme catalysis (Hajdu et al., 1987; Helliwell et al., 1989; Shrive et al., 1990). Detection of Diffracted X-rays
Film methods were the first used for the efficient detection of diffracted X-rays from macromolecular crystals. Later, single counter diffractometers were used, but these can generally make only one measurement at a time, whereas a film can record hundreds of diffraction data simultaneously. Given that film has an advantage over single counter as an area detector, this method reemerged in the 1970s as the method of choice to collect accurate medium to high resolution data. This was dependent on the development of screenless precession (Xuong and Freer, 1971) and screenless oscillation photography (Arndt et al., 1973) as well as improved software. A crystal is mounted on a horizontal spindle of an oscillation camera and one of the principal axes of the crystal aligned along the axis of the spindle. The crystal is then rotated through an angle, governed by the size of the unit cell (0.25-1° for viruses/large proteins, 1-2.5° for small proteins) about the spindle while being exposed to X-rays, and the diffracted beams are recorded on flat, curved, or V-shaped film cassettes. The cassette contains a pack of three to six films so that the strongest reflections are attenuated and recorded within the linear range of response of the X-ray films. After collecting for a preset time, the computer moves the crystal to a new position and a fresh film cassette records the data from the current position of the crystal. Position-sensitive photon detection used in high-energy nuclear physics was the next generation of detection for crystallographic use (Charpak et al., 1968). A position sensitive photon detector is a chamber consisting of a horizontal and a vertical plane of wires and filled with xenon gas. A photon arriving through the gas-tight window of the detector ionizes xenon at a particular location and is
10
ANIRHUDDHA ACHARI and DAVID K. STAMMERS
recorded by the X-Y grid of wires. Image intensifier or fast detector employs another method in which the diffracted X-rays excite visible wavelength fluorescence from phosphor-covered fiber optics screen, and the emitted light is then converted to electrons and the image is read out by a television scanning system. These detectors can be used as multidetector arrays to collect larger volumes of data faster. Another advantage of flat detectors is that they can be placed close to the crystal or further away to resolve the closely spaced spots from a large unit cell (Cork et al., 1974). Image plates, which are europium-doped phosphors, have replaced films for data collection in laboratories and at synchrotron sources (Hendrickson and Ward, 1987). Charged coupled devices (CCD) are being developed for use as detectors at synchrotrons. Their particular advantage over image plates is a shorter read-out time. They will also soon be available in the laboratory. Data Reduction and Processing
In a crystallographic experiment, the strategy of data collection is to collect a large volume of data. All Miller indices (h k 1) and their symmetry-related mates have their intensities measured more than once. Multiple observations of a reflection h k 1 allow better scaling of symmetry-related mates and a measure of systematic error in the data such as absorption. Any data-processing software such as Xengen (Howard et al., 1985, 1987) or DENZO (Otwinosky and Gewirth, 1993) and XDS (Kabsch, 1988) goes through several steps: 1. Get the unit cell, symmetry, and misorientation angles of the crystal. 2. Index spots or predict their positions from the parameters found in (1) and integrate the observed intensities. 3. Reduce the data to a unique set. 4. Scale and reject outliers. 5. Finally, produce a data set containing scaled h k 1 and intensities (I) or amplitude (F).
METHODS OF PHASE DETERMINATION The intensity of each reflection h k 1 is the quantity one measures in an X-ray diffraction experiment. Intensity I is proportional to the square of the structure factor F, which is a complex number consisting of an amplitude F and a phase a: F
hki = FhkileicW
(3)
hence F 2 = [IFhklleiaKk.][IFhklle-" hki] - so that measured X-ray intensity I has no phase information (see Stout et al., 1968).
Protein Crystallography
11
A regular repeating function such as electron density p of a crystal can be represented by a Fourier series: p^l/VZISlFje-
2
"^^
(4)
where the triple summation is over all h k 1 values and V is the unit cell volume. One has the problem of determining the phase angle associated with experimental intensities to generate an electron density map. This is the so-called phase problem in protein crystallography. Molecular Replacement
Patterson (1935) showed that a Fourier map calculated with the "phaseless" F 2 as coefficients, known as a Patterson map or Patterson function, has peaks corresponding to all interatomic vectors. The idea of molecular replacement is to rotationally orient and then translate to the correct position a known molecule into a crystallographic unit cell. If an unknown protein structure has been crystallized, native data collected and a set of atomic coordinates are available from a closely related structure, then the known model can be used to solve the structure of the unknown protein. This is accompanied by orienting the model molecule by the rotation function as follows: R(C) = JP i (x)P m (cx)dv
(5)
where Pj is the Patterson function of the unknown crystal and P m is the Patterson function of the known molecule. C is usually represented by three Eulerian angles. Peaks in the rotation function R(C) represent the possible correct orientation of the known structure in the unknown unit cell. Interatomic vectors between atoms within a molecule (self-vectors) depend only on the structure and lie close to the origin of the Patterson map. Once the orientation of the model molecule in the unknown cell has been established, the next step is to translate the molecule in the correct position of the cell. The translation function depends on intermolecular vectors (cross vectors) between molecules related by the space group symmetry of the unknown cell and will reach a maximum when the correctly oriented molecule is stepped through the cell of the unknown (Rossmann and Blow, 1962, Fitzgerald, 1988). Alternatively, one can do an R-factor search R = Z(|Fo|-|Fc|)/2|Fo|
(6)
where IFol is the observed structure amplitude and IFcl is the calculated structure amplitude at a particular position of the cell. A minimum value of R indicates the correct location of the molecules.
12
ANIRHUDDHA ACHARI and DAVID K. STAMMERS Multiple Isomorphous Replacement
Perutz and co-workers used multiple isomorphous replacement to solve the phase problem for hemoglobin more than four decades ago. In this method, a large atom (e.g., uranium, mercury, gold) or a cluster of heavy atoms is diffused into the protein crystal. If the heavy-atom derivatized crystal remains isomorphous, the measured intensities obtained from the derivative will differ slightly from the parent. If F h, F , and Fh are the structure amplitudes of the derivatives, natives, and heavy atom respectively, then these are related by the vector sum:
We measure the amplitudes F and F h, allowing an estimate for the amplitude Fh. The difference Patterson function is the Fourier transform using the square of the difference amplitudes (F h~F ) as coefficients and shows peaks at the end of the vectors connecting atoms. The Patterson search technique is often used to locate the heavy atom(s) within the unit cell. Once one knows the location of the heavy atom, a h is known (Figure 1). In an ideal, error-free problem, the triangle will close and one can estimate the value of protein phase a . This ambiguity can be resolved with a second derivative as demonstrated by the Harker (1956) diagram (Figure 2). A circle is drawn with F , the parent structure factor whose magnitude is known but not its direction; then the heavy-atom structure factor vector -F h is added to F and a second circle of radius F h (the derivative structure factor) is drawn with its origin at the end of -Fh. The two circles interact at two places corresponding to the two possible phase angles of F . Magnitudes of F and F h are available from measured X-ray data, the magnitude of Fh is obtained from the difference of F . and F , and its phase is figured from the knowledge of the heavy atom positions. The positional parameters (coordinates, occupancy, temperature factor—either isotropic or anisotropic) of heavy atom(s) are refined by programs to minimize the difference between calculated and observed structure factors. The presence of an ambiguity of phase angle with a single derivative can be resolved by the use of a second derivative or anomalous dispersion. With the data from the second derivative (F h2 ), the correct phase a is located where the three circles intersect. Experimental data are not error-free and as a consequence phase triangles don't close and circles in Figure 1 do not intersect to give an unambiguous phase for F . Blow and Crick (1959) introduced the idea of casting the phase as a probability distribution of the form: p . s o ( a ) = * H VF ph (calc)|E 2 iso)
(8)
where E represents cumulative error and IF h-F h(calc)l is the lack of closure—a measure of how poorly the phase triangle closes. The errors and the problems due to the presence of errors arise primarily from lack of isomorphism, inaccuracies in intensity measurements, and scaling of native and derivative data sets. The opera-
Protein Crystallography
13
Figure 1. Estimation of protein phases in a single isomorphous (SIR) case. (A) Only the magnitude of Fp is known. The loci of all possible values of Fp form a circle of radius IF p l. The information from a single heavy atom derivative can be used to reduce the number of possible phase values to two. (B) Both the magnitude and phase of F H are known. The possible values of F PH correspond to a circle of radius IF P H I. If this circle is centered at -F H , then since Fp = F PH - F H , the points of intersection of the t w o circles give the two possible values of ocp.
14
A N I R H U D D H A ACHARI and DAVID K. STAMMERS
Figure 2. Estimation of protein phases with two heavy atom derivatives; multiple isomorphous replacement (MIR). The method of isomorphous replacement requires information from at least two heavy atom derivatives to unambiguously assign the phase of the parent structure factor. The case of two heavy atom derivatives is diagrammed above. The point of intersection of all three circles indicates the parent phase.
tional philosophy of MIR is to continue to collect heavy atom data from more than two derivatives until an interpretable electron density map can be calculated. Covalent modification of free sulphydryl groups of cysteine residues by mercurial compounds are often "sure-shot" derivatives. With the availability of recombinant DNA techniques, site specific cysteine mutants can be introduced; otherwise trial and error seems to remain the method of choice. A n o m a l o u s Dispersion
The phenomenon of anomalous dispersion or scattering occurs when the frequency of an incoming X-ray is close to the absorption frequency of a heavy atom; the X-ray will undergo a phase shift and become attenuated. The expression for scattering factor f of an atom then is: f=fo + f + iAf
(9)
Protein Crystallography
15
The correction term f is a negative real number representing the attenuation and iAf' represents the phase shift (Figure 3 ). The Freidel law (Fhkl = F. h k l ) breaks down in the presence of significant anomalous scattering. Although the anomalous effect is often a small difference between two large numbers representing a reflection F hkl and its Freidel mate F_h.k.!, when measured accurately it can be considered as a good second derivative to resolve the phase ambiguity. With the tunable frequency of X-rays from a synchro-
Figure 3. Effect of anomalous dispersion on structure factor; pictorial representation of Equation 9. The structure factors F(W and V(-h) can be expressed as the sums of the vectors representing the normal scatterers F (±h), and the normal, dispersion, and the absorption components of the anomalous scatterers (F ±h), F'(±h), and F"(±h), respectively. If only one type of anomalous scatter is present, then the phase of F'(±h) leads by 7i/2 that of F'(±«.
16
ANIRHUDDHA ACHARI and DAVID K. STAMMERS
tron source, there is now increasing interest in the use of anomalous dispersion to solve structures from metalloproteins and signals from sulfurs of methionines or cysteines replaced by selenium.
MAP IMPROVEMENTS Density Modification
An initial MIR map, though of sufficient quality to show secondary structure elements such as (3-strands and a-helices, is often not good enough to do an accurate chain tracing of the protein. One of the best tested methods of density modification available to crystallographers is solvent flattening, which relies on the fact that in the unit cell of a protein crystal there is bulk solvent outside the envelope defined by the protein molecule, and the electron density of this should be of a constant low value. Hence, once the location (the envelope) is defined by the criteria of contiguity of electron density for a protein and the knowledge of solvent content from either measured or estimated density of the crystal, one can get more accurate phases by attempting cycles of solvent flattening and combining the solvent-modified phases with MIR phases. Obviously the larger the solvent content, the more accurate the phase that can be obtained. If there is more than one copy of the macromolecule in the asymmetric unit, noncrystallographic symmetry can be used for averaging to yield better phases leading to a cleaner electron density map (Wang, 1985.). Software such as SQUASH (Cowtan and Main, 1993) uses Sayers' equation and density modification along with noncrystallographic symmetry (if present) to improve phases and/or extend phase to higher resolution data. SOLOMON (Abrahams and Leslie, 1996) exploits "solvent flipping" along with solvent flattening for phase improvement. Maximum Entropy Techniques
Following Shanon (1948), a unique and consistent measure of the amount of "ignorance" (uncertainty, entropy) in a discrete probability distribution containing the electron density p is given by: s = -plogp
and is immediately seen to correspond to the Boltzmann expression for entropy that arises in statistical mechanics. The basic theory of this formalism is to maximize plogp; hence the name maximum entropy methods. Several groups are working to develop ab initio phasing of macromolecules from diffraction data. Prince and his co-workers (1988) have shown that maximum entropy is a powerful technique for phase improvement/extension when the molecular envelope is available. A structure of a DNA oligomer and several small molecules were solved by a method that uses a maximum entropy formalism on cross-entropy with phases (Harrison, 1989;
Protein Crystallography
17
Miller et al., 1988). Carter and co-workers are developing techniques with maximum entropy, phase permutation, and likelihood scoring for ab initio phasing and phase improvement.
STRUCTURE REFINEMENT In small molecule crystallography where X-ray amplitudes are available to atomic/near-atomic resolution, full matrix least squares refinement is done to improve the quality of the structure. A residual R R= SjFhuCobs)- Fhki(calc)|/2hklFhkl(obs)
(10)
is minimized with respect to the coordinates, and thermal parameters of the atoms: p(r) = p0exp(-|Bi.(r-ri)|)2
(11)
where an atom i is located at xx (i = i-n^ atom of the structure) and B{ is a symmetric tensor representing the thermal motion of the atom i as an ellipsoid. The refinement is carried out in reciprocal space by calculating the structure factor: F^Ccalc) = Sp(r)exp(-27iih.r)d3r
(12)
where the integration is done over the entire volume V of the unit cell of the crystal. Elements of the normal matrix constructed from this equation are the derivatives of the structure factors with respect to rj and the thermal tensor B{. The number of parameters for anisotropic refinement of individual atoms is 3+6=9; this number is 4 (3+1) for isotropic temperature factor refinement. For small molecules diffracting to atomic resolution, the ratio of observable data to refinable parameters is large enough to have stable least-squares refinement (Hendrickson, 1985). Most macromolecules do not diffract to atomic resolution so that full matrix least-square refinement, more often than not, cannot be used in this case. In the latter half of the 1960's, Diamond (1971) suggested a real space technique of structure refinement by minimizing: (p(r)-pm(r))d3r
(13)
where p(r) and pm(r) are the electron densities obtained from the Fourier transform of the observed structure factors and the model respectively. In the late 1970s, Hendrickson (Hendrickson and Konnert, 1980) used restrained conjugate gradient techniques and included the knowledge of stereochemistry (i.e., ideal values of bond lengths, bond angles, planarity, chiral volumes, etc) in the refinement as additional observations with appropriate weights so that the ratio of observed to refinable parameters was increased. Weights for the stereochemistry are assigned from the information such as the standard deviations of bond lengths from refined X-ray crystal structures of amino
18
ANIRHUDDHA ACHARI and DAVID K. STAMMERS
acids and peptides. At the start of a refinement cycle with a crude model, often medium resolution data (4A or 3 A) is used to allow large shifts of up to 1A in the model. As the refinement progresses, higher resolution data are added in bins of resolution. Once a round of refinement converges as judged by negligible or no shift of parameters in two consecutive cycles of refinement, one calculates a difference or a 2Fo-Fc map to rebuild the model manually with the help of graphics programs such as ERODO (Jones, 1978, 1986) or O (Jones et al., 1991). The process of refinement and model building, when atoms are moved to fill electron density better, extra atoms added to fit electron density assignments, solvent molecules added, incorrectly placed atoms deleted, is iterated until a difference map is featureless. Until recently this was the method of choice for successful refinement of protein structures. The methods for arriving at this global minimum in the conventional conjugate gradient least-squares programs of Hendrickson and others is that of gradient descent. If the starting model is not close to the final model, the methods of gradient descent cannot go through the uphill barrier of peptide flips, large movements of the main chain, and so forth. The recent advance in macro molecular refinement came through the introduction of molecular dynamics (Brunger et al., 1987) and simulated annealing to explore large areas of conformation spaces with an initial model. Molecular dynamics is a technique in which energy is pumped in a system of macromolecular assembly by increasing the "temperature" of the system's coordinates and velocities of atoms are allowed to vary according to Newton's laws of motion. The initial set of crystallographic coordinates rj, obtained either from an initial chain trace of a multiple isomorphous map or a molecular replacement solution, are assigned initial velocities Vj (t=0) = dr/dt where directions of Vj are random and magnitudes of Vj are given by: v i 2 (t = 0 ) o c T
(14)
T is a nominal temperature. If the atom i with mass ir^ is acted upon by a force F k i , then the structure will change according to ai(0) = d 2 r i /dt 2 = (Z(F ld ))/m i
(15)
If the sampling in time is small enough (in femto seconds), then one can approximate the velocity and displacement at time At as Vj(At) = vj(0) + A*aj(0)
(16)
rjCAt) = rj(0) + At*Vi(0)
(17)
In crystallographic refinement incorporating molecular dynamics, the conventional force constants F ^ (i.e., force of type k on the atom i) are pseudo forces retaining
Protein Crystallography
19
the calculated structure factor to be similar in magnitude to the observed structure factors. The potential (pseudo-potential) energy function to be minimized is: E=E chem + (ZOWobs) - F ^ c a l c ) ) 2 ) / ^
(18)
where Echem includes terms from bond length, bond angles, electrostatic forces, van der Waals forces, and so on, and Gx is a weight factor. In a standard crystallographic refinement, the system is heated to a typical temperature of T = 4000 K and then slow-cooled against a heat bath for about 25 to 50 steps of 0.5 femtoseconds with a reduction of T by 25 units until T = 300K. This process explores a large area of conformational space and is capable of making large adjustments to atomic positions. XPLOR (Brunger, 1988) and GROMOS (Gros et al., 1990) are two powerful packages for macromolecular structure refinement and reduce the time spent by tae user on manual model building. The other more recent development in refinement is the automatic refinement program by Lamzin and Wilson (1993). For a well-diffracting crystal (2.0A or better), this program can, with an initial poor model, include atoms or delete wrongly placed atoms or even add solvent molecules. The input of the crystallographer, however, is critical and essential even with the use of "automatic" or "semi-automatic" refinement programs to check and judge the accuracy and validity of the refined models, that is, that they make chemical sense and fit with known biochemical and biological facts.
FINAL MODEL AND VALIDITY OF THE STRUCTURE During cycles of model building and refinement, electron density maps are generated from calculated phases, which introduces a degree of model bias to the map. To reduce the bias, 2 IFo l-Fc or in general mIFol-nlFcl (m=n+l) are used as amplitudes to generate Fourier maps. If experimental sources of phases, such as MIRs, are available, one can combine experimental and calculated phases with appropriate weights using software such as SIGMA A (Reed, 1986) or COMBINE (Z. Otwinosky, private communication) to reduce the bias. Another way of checking the validity and reducing bias is to calculate a series of omit maps in which a fragment or fragments in turn are omitted from calculating phases and then an electron density map is calculated. The omit map will reveal the fragment without any bias coming from the presence of it during phase calculation. XPLOR has the facility to calculate an annealed omit map, in which a fragment is omitted, and then the rest of the molecule undergoes a short period of molecular dynamics, and then the atoms are allowed to refine. An electron density map then reveals the omitted fragment on the map. The molecular dynamics run wipes out any residual "memory" of bias from the original phase set. Improved detection technology—Charged Coupled Device (CCD), for example—in tandem with stronger X-ray sources, clever software, and faster computing
20
ANIRHUDDHA ACHARI and DAVID K. STAMMERS
make the future of macromolecular crystallography and refinement ever more exciting.
REFERENCES Abrahams, J.P. and Leslie, A.G. (1996). Methods used in the structure determination of bovine mitochondrial Fl ATPase. Acta Crystallogr., Sect. D, 52, 30-42. Arndt, U.W., Champness, J.N., Phizackerley, R.P., and Wonacott, A.J. (1973). Single-crystal oscillation camera for large unit cells. J. Appl. Crystallogr. 6,457-463. Blow, D.M. and Crick, F.H.C. (1959). The treatment of errors in the isomorphous-replacement method. Acta Crystallogr. 12, 794-802. Brunger, A.T. (1988). Crystallographic refinement by simulated annealing. Application to a 2.8A resolution structure of aspartate aminotransferase. J. Mol. Biol. 203, 803-816. Brunger, A.T, Kuriyan, J., and Karplus, M. (1987). Crystallographic R-factor refinement by molecular-dynamics. Science, 235, 458-460. Carter, C.W., Jr. (Ed.)(1990). Protein and nucleic acid crystallization. Methods 1, 1-127. Carter, C.W., Jr. and Carter, C.W. (1979). Protein crystallization using incomplete factorial experiments. J. Biol. Chem. 254, 12219-12223. Charpak, G., Boucher, R., Bressani, T, Favier, J., and Zupancic, C. (1968). Some read-out systems for proportional multiwire chambers. Nucl. Instrum. and Methods, 62, 262. Cork, C, Fehr, D, Hamlin, R., Vernon, W., Xuong, Ng.-H., and Perez-Mendez, V. (1974). Multiwire proportional chamber as an area detector for protein crystallography. J. Appl. Crystallogr. 7, 319-323. Cowtan, K.D. and Main, P. (1993). Improvement of macromolecular electron-density maps by the simultaneous application of real and reciprocal space constraints. Acta Crystallogr. Sect. D, 49, 148-157. Cox, M.J. and Weber, PC. (1987). Experiments with automated protein crystallization. J. Appl. Crystallogr. 20, 366-373. Davis, S., Puklavec, M.J., Ashford, D.A., Harlos, K., Jones, E.Y., Stuart, D.I., and Williams, A.F. (1993). Expression of soluble recombinant glycoproteins with predefined glycosylation: Application to the crystallization of the T-cell glycoprotein CD2. Protein Engineering, 6, 229-232. DeLucas, L.J., Smith, CD., Smith, H.W., Vijay-Kumar, S., Senadhi, S.E., Ealick, S.E.* Carter, D.C., Snyder, R.S., Weber, PC, and Salemme, F.R., Taylor, G., Stammers, D.K., Powell, K., Darby, G., and Bugg, C (1989). Protein crystal growth in microgravity. Science, 246, 651-654. Diamond, R. (1971). Real-space refinement procedure for proteins. Acta Crystallogr., Sect. A, 27, 436-452. Ducruix, A. and Giege, R. (1992). Crystallization of Nucleic Acids and Proteins: A Practical Approach. Oxford University Press, Oxford, England. Dyda, F, Hickman, A.B., Jenkins, T.M., Engelman, A., Craigie, R., and Davies, D.R. (1994). Crystal structure of the catalytic domain of HIV-1 integrase: Similarity to other polynucleotidyl transferases. Science, 266, 1981-1986. Fitzgerald, P.M.D. (1988). Merlot, an integrated package of computer-programs for the determination of crystal-structures by molecular replacement. J. Appl. Crystallogr. 21, 273-278. Gilliland, G.L. and Bickham, D.M. (1990). The biological macromolecular crystallization database: A tool for developing crystallization strategies. Methods 1,6-11. Gros, P., van Gunsteren, W.F., and Hoi, W.G. (1990). Inclusion of thermal motion in crystallographic structures by restrained molecular dynamics. Science, 249, 1149-1152. Hajdu, J., Acharya, K.R., Stuart, D.I., McLaughlin, P.J., Barford, D., Oikonomakos, N.G., Kein, H., and Johnson, L.N. (1987). Catalysis in the crystal: Synchrotron radiation studies with glycogen phosphorylase b. EMBO J. 6, 539-546.
Protein Crystallography
21
Harada, S., Yasui, M, Masanori, Y., Murakawa, K., Kasai, N., and Satow, Y. (1986). Crystal-structure analysis of cytochrome-c' by the multiwavelength anomalous diffraction method using synchrotron radiation. J. Appl. Crystallogr. 19, 448-452. Harker, D. (1956). The determination of the phases of the structure factors of noncentrosymmetric crystals by the method of double isomorphous replacement. Acta Crystallogr. 9, 1-9. Harrison, R.W. (1989). Minimization of cross entropy - A tool for solving crystal structures. Acta Crystallogr., Sect. A, 45,4-10. Helliwell, J.R., Habash, J., Cruickshank, D.W.J., Harding, M.M., Greenhough, T.J., Campbell, J.W., Clifton, I.J., Elder, M., Machin, P.A., Papiz, M.Z., and Zurek, S. (1989). The recording and analysis of synchrotron X-radiation Laue diffraction photographs. J. Appl. Crystallogr. 22,483-497. Hendrickson, W.A. (1985). Stereochemically restrained refinement of macromolecular structures. Meth. Enzymol. 115,252-270. Hendrickson, W.A. and Konnert, J.H. (1980). Incorporation of stereochemical information into crystallographic refinement. In: Computing in Crystallography. (Diamond, R., Ramaseshan, S., and Venkatesan, K., Eds.), pp. 13.01-13.25. Indian Acad. Sci., Bangalore, India. Hendrickson, W. A., and Ward, K.B. (1987). Imaging Plate Detectorsfor Synchrotron Radiation. Howard Hughes Medical Institute Scientific Conference Center, Coconut Grove, FL. Howard, A.J., Nielsen, C, and Xuong, Ng-H. (1985). Software for a diffractometer with multiwire area detector. Methods Enzymol. 114, 452-472. Howard, A.J., Gilliland, G.L., Finzel, B.C., Poulos, T.L., Ohlendorf, D.H., and Salemme, F.R. (1987). The use of an imaging proportional counter in macromolecular crystallography. J. Appl. Crystallogr. 20, 383-387. Kabsch, W. (1988). Evaluation of single-crystal X-ray-diffraction data from a position-sensitive detector. J. Appl. Crystallogr. 21, 916-924. Jancarik, J. and Kim, S.H. (1991). Sparse-matrix sampling—A screening method for crystallization of proteins. J. Appl. Crystallogr. 24, 409-411. Jones, N.D., Decter, J.B., Swartzenderber, J.K., and Landis, RL. (1987). Amer. Crystallogr. Assoc. Meet. March 15-20, Austin, Texas, H-4. (Abstr.) Jones, T.A. (1978). A graphics model building and refinement system for macromolecules. J. Appl. Crystallogr. 11,268-272. Jones, T.A. (1986). Interactive computer graphics: FRODO. Meth. Enzymol. 115, 157-171. Jones, T.A., Zou, J.-Y, Cowan, S.W., and Kjeldgaard, M. (1991). Improved methods for building protein models in electron-density maps and the locations of errors in these models. Appl. Crystallogr. A47, 110-119. Lamzin, V.S. and Wilson, K.S. (1993). Automated refinement of protein models. Acta Crystallogr. Sect. D,49, 129-147. Laver, W.G. (1990). Crystallization of antibody-protein complexes. Methods, 1, 70-74. McPherson, A. (1982). The Preparation and Analysis of Protein Crystals. John Wiley and Sons, New York. McPherson, A. (1990). Current approaches to macromolecular crystallization. Eur. J. Biochem. 189, 1-23. Miller, M., Harrison, R., Wlodawer, A., Appella, E., and Sussman, J.L. (1988). Crystal-structure of 15-mer DNA duplex containing unpaired bases. Nature 334, 85-86. Miller,M., Jaskdlski, M., Rao, J.K.M., Leis, J., and Wlodawer, A. (1989). Crystal structure of a retroviral protease proves relationship to aspartic protease family. Nature 337, 576-579. Otwinosky, Z. and Gewirth, D. (1993). Denzo Manual. Yale University, New Haven, CT. Patterson, A.L. (1935). A direct method for the determination of the components of interatomic distances in crystals. Z. Krist. 90, 517-542. Prince, E., Sjolin, L., and Alenljung, R. (1988). Phase extension by combined entropy maximization and solventflattening.Acta Crystallogr., Sect. A, 44, 216-222.
22
ANIRHUDDHA ACHARI and DAVID K. STAMMERS
Reed R.J. (1986). Improved Fourier coefficients for maps using phases from partial structures with errors. Acta Crystallogr., Sect. A, 42, 140-149. Richards, F.M. (1985). Optical matching of physical models and electron density maps: Early developments. Meth. Enzymol. 115, 145-154. Rodgers, D.W. (1994). Cryocrystallography. Structure 2, 1135-1140. Rossmann, M.G. and Blow, D.M. (1962). The detection of subunits within the crystallographic asymmetric unit. Acta Crystallogr. 15, 24-31. Shanon, C.E. (1948). The Mathematical Theory of Communication. Bell Syst.Tech. J., 27, 379-423, 623-656. Shrive, A.K., Clifton, I.J., Hajdu, J., and Greenhough, T.J. (1990). Laue film integration and deconvolution of spatially overlapping reflections. J. Appl. Crystallogr. 23, 169-174. Stout, G.H. and Jensen, L.H. (1968). X-ray Structure Determination. The Macmillan Company, London, England. Stura, E.A. and Wilson, I.A. (1990). Analytical and production seeding techniques. Methods 1, 38-49. Wang, B.C. (1985). Resolution of phase ambiguity in macromolecular crystallography. Methods Enzymol. 115,90-112. Weber, RC. (1991). Physical principles of protein crystallization. Adv. Protein Chem. 41, 1-36. Xuong, Ng-H. and Freer, S.T. (1971). Reflection intensity measurement by screenless precession photography. Acta Crystallogr., Sect. B, 27, 2380-2387. Zaluaf, M. and D'Arcy, A. (1992). Light scattering of proteins as a criterion for crystallization. J. Crystal Growth 122, 102-106. Zeppezauer, M. (1971). Formation of large crystals. Meth. Enzymol. 22, 253-266.
Chapter 2
The Chemistry of Protein Functional Groups GARY E. MEANS, HAO ZHANG, and MIN LE
Abstract Introduction Modification of Amino Groups (a-NH2 and Lysine) Reductive Methylation Amidination Maleic Anhydride Trinitrobenzenesulfonate Selective Modifications of a- or e-Amino Groups Modification of Imidazole Groups (Histidine) Diethyl Pyrocarbonate Modification of Guanidino Groups (Arginine) Butanedione Phenylglyoxal Modification of Carboxyl Groups (a-COOH, Aspartate, and Glutamate) Water-soluble Carbodiimides and Glycine Ethyl Ester Modification of Carboxamide Groups (Asparagine, and Glutamine) Deamidation Modification of Sulfhydryl Groups (Cysteine) TV-Ethylmaleimide
Protein: A Comprehensive Treatise Volume 2, pages 23-59 Copyright © 1999 by JAI Press Inc. All rights of reproduction in any form reserved. ISBN: 1-55938-672-X 23
24 24 29 29 31 31 32 32 33 33 34 34 35 36 36 37 37 38 38
24
GARY E. MEANS, HAO ZHANG, and MIN LE
Methyl Methanethiosulfonate Dithio(2-nitrobenzoate) Dipyridyl Disulfide Selective Reactions with Vicinal Sulfhydryl Groups Modification of Disulfide Bonds (Cystine) Reduction by Dithiothreitol and Other Thiols Modification of Thioether Groups (Methionine) Hydrogen Peroxide Chloramine T Modification of Indole Groups (Tryptophan) N-Bromosuccinimide Modification of Phenolic Groups (Tyrosine) Iodination Tetranitromethane yV-Acetylimidazole
39 39 42 43 44 44 45 45 45 46 46 46 47 48 49
ABSTRACT The physical, chemical, and biological properties of proteins are determined to some extent by the properties of their constituent amino acid side-chains or functional groups. Reagents and procedures are described for selective chemical modification of the major types of functional groups. Those reagents and procedures can be used to alter the properties of individual proteins and to identify functional groups required for the catalytic activities of enzymes and those responsible for the other properties of biologically important proteins.
INTRODUCTION Amino acid side chains containing oxygen, nitrogen, or sulfur atoms are required for the catalytic activities of enzymes and for the biological properties of most other proteins. They are called functional groups due to their roles as acids, bases, nucleophiles, electrophiles, electrostatic charges, hydrogen bond donors, acceptors, and so forth in their catalytic mechanisms and/or other functions. The particular side chains required for the biological activity of a protein can often be determined from the effect(s) of their modification on the activity. Side chains composed entirely of carbon and hydrogen atoms are just as surely necessary but are not usually called functional groups and are not usually subject to chemical modification. Some properties of those side chains and the so-called functional groups are presented in Tables 1 and 2. ct-Amino and cc-carboxyl groups are similar in most respects to side chain amino and carboxyl groups and will be discussed in the same sections. Less common functional groups, like those of y-carboxyglutamate, phosphoserine, phosphotyrosine, and O- and N-linked glycosyl groups resulting from various posttranslational modifications will not be addressed.
The Chemistry of Protein Functional Groups
25
Table 7. Physical Parameters of Amino Acid Residues Surface Area (A2j°
Van der Waals* Residue
3
Volume (A )
Total
Side chain
Hydrophobicit/1 (Kcal/mol)
Ala
67
113
67
Arg
148
41
196
3.0
Asn
96
158
113
0.2
Asp
91
151
106
2.5
1/2Cys
86
140
104
-1.0 0.2
-0.5
Gin
114
189
144
Glu
109
183
138
Gly
48
83
His
118
194
151
-0.5
He
124
182
140
-1.8 -1.8
2.5 0
Leu
124
180
137
Lys
135
211
167
3.0
Met
124
204
160
-1.3
Phe
135
218
175
-2.5
Pro
90
143
105
-1.4
Ser
73
122
80
0.3
Thr
93
146
102
-0.4
Trp
163
259
217
-3.4
Tyr
141
229
187
-2.3
Val
105
160
117
-1.5
Notes: aCreighton, 1993. b Milleretal., 1987. c Levitt, 1976.
Proteins are subject to chemical modification for many different purposes. They are sometimes modified, for example, to increase or decrease their solubility, to promote or discourage subunit dissociation, to alter their susceptibility to proteolysis, to stabilize or protect unstable structures during sequence determinations, and for other kinds of structural studies. Modification procedures are sometimes used to introduce isotopic, fluorescent, and other kinds of spectroscopic labels to effect the attachment or conjugation of one protein to another protein, to an insoluble support, or to some other substance, and to determine spatial relationships between side chain groups. Chemical modification procedures are also sometimes used to determine the number of certain amino acid residues and are frequently the simplest and most direct way to identify particular amino acid residues required for biological activity.
26
GARY E. MEANS, HAO ZHANG, and M1N LE Table 2. The Properties of lonizable Groups of Side Chains in Proteins pKaa
Residues Arg Asp Cys Glu His Lys Ser Thr Tyr
Nominal 12.0 4.0 8.7 4.4 6.5 10.5 14.2 15.0 10.1
Range
— 3.9-4.0 9.0-9.5 4.3-4.5 6.0-7.0 10.4-11.1
— — 10.0-10.3
AH (Kcal/mol)h 12.4 1.2 8.6 1.2 4.0 11.0
— — 6.0
Notes: aDixon and Webb, 1964; Creighton, 1993; Kyte, 1995. b Dixon and Webb, 1964; Fasman, 1974.
Table 3 lists some of the most widely used and most effective reagents/procedures for identifying functional groups required for the biological activities of proteins and indicates the extent to which they can be expected to react with the different protein functional groups. Although more reagents might have been included, the list is purposely short so as to emphasize those reagents thought to be the most useful. Those included were selected particularly for their specificity for a single functional group under conditions compatible with the biological activity of most proteins. Those effecting the least change in size or charge of the side chains and, thereby, usually having the least effect on protein structure, those that are easy to detect, follow, or determine, those giving chemically stable derivatives that can be isolated and characterized, or those from which the original side chain can subsequently be regenerated were also considerations. These and other desirable attributes will be discussed in regard to each of the reagents described below. More extensive discussions of protein modification reagents and procedures are available elsewhere (Hirs, 1967; Means and Feeney, 1971a; Hirs and Timasheff, 1972, 1977, 1983; Glazer and Delange, 1975; Lundblad and Noyes, 1984; Eyzaguirre, 1987; Imoto and Yamada, 1989; Wong, 1991; Lundblad, 1991, 1995). A similar list including more than a hundred reagents has been published elsewhere (Means and Feeney, 1993). Although most of the reagents listed in Table 3 affect more than just one functional group, differences in reactivity are frequently large and selective modification can usually be achieved by using limited amounts of reagent. N-Ethylmaleimide, for example, reacts readily with a wide range of nucleophiles but about 1000 times faster with sulfhydryl groups than with other common functional groups (Brewer and Riehm, 1967) and, in limited amounts, is usually very specific for them.
Table 3. The Specificity of Reagents and Procedures for Chemical Modification o f Protein Functional Groups AMINOC GUANlDlNO IMIDAZOLE CARBOXYL
AMINO GROUPS Citraconic anhydride Maleic anhydride Methyl acetimidate Reductive methylation Trinitrobenzenesulfonic acid
N
V
+++d
-
+++d
-
+++ +++ +++
f f
-
-
-
-
-
-
-
THIOL
+ +
-
THIOETHER DlSULFlDE PHENOL
-
-
k
INDOLE
-
-
5 -
-
-
-
-
te
-
-
IMIDAZOLE GROUPS Diethyl pyrocarbonate
+
-
+++d
k
+d
-
-
+d
-
GUANlDlNO GROUPS ~utanedione~ Phenylglyoxal
+'
-
+++
-
-
-
-
-
-
-
-
-
+
-
-
-
+d
-
CARBOXYL GROUPS Water soluble carbodiimide + glycine ethyl ester
DlSULFlDE BONDS Dithiothreitol
+++
+g
-
-
+++
+
-
-
-
-
-
-
-
+++d
-
-
(continued
Table 3. Continued AMIN@
C U A N l D l N O I M I D A Z O L E CARBOXYL
THIOETHER GROUPS Hydrogen peroxide Chloramine T
-
-
++
-
-
-
-
-
?
-
-
-
THlOL
+++ +++
THIOETHER DISULFIDE
++d
++d
PHENOL
INDOLE
-
-
-
-
+++d
-
INDOLE GROUPS N-Bromosuccinimide N-Chlorosuccinimide PHENOL GROUPS N-Acetyl imidazole
N
Q,
Iodine
-
Tetranitromethane
-
++
-
It
-
+++
+
+++
-
+++ +++
+ +
Notes: T h e reagents included were selected from a longer list of reagents (Means and Feeney, 1993) on the basis of their widespread, general usefulness in determining functional groups required for biological activity. The indicated specific~tiesare those expected under the conditions usually employed, as described in the text. b ~ h indicated e reaction specificities are as follows:
+++ highly reactive, extensive reaction under typical conditions; ++ significant reaction should be expected; + some reaction is possible but is not usually extenswe;
- no reaction expected under typical conditions; + a significant reaction usually takes place but the resulting derivatives are unstable and usually break down to regenerate the original side chain group. 'Reagents that affect €-amino groups usually also affect a-amino groups. d ~ a be n converted back to the unmod~fiedside chain under relat~velymild conditions. eMay be hazardous if handled improperly--check literature before using. '~eactionrequires two closely spaced thiol groups. gln the absence of an added nucleophile, may give rise to cross links with nearby carboxyl moieties. h ~pH t 7 to 9 in the presence of borate. 'Usually affects only a-amino groups.
The Chemistry of Protein Functional Croups
Sulfhydryl groups are generally one of the most reactive functional groups in proteins and are therefore usually one of the easiest to modify. Due to their generally high reactivity, however, they also sometimes interfere with the modification of other functional groups. A^Bromosuccinimide, for example, is widely used to modify tryptophan residues of proteins but usually not those with sulfhydryl groups, which almost always react even faster. The use of other oxidizing and electrophilic reagents is also usually limited, for similar reasons, to proteins without sulfhydryl groups. The stabilities of the products of a modification procedure are sometimes just as important as their rate of formation. Under the conditions usually employed to acylate amino groups (i.e., pH -8.5 to 10), for example, sulfhydryl, imidazole, and phenolic groups are also normally acylated to some extent, but the resulting products are usually unstable and either hydrolyze spontaneously to regenerate the original side chains or they can be readily deacylated by a subsequent treatment with hydroxylamine. The specificity of acylating agents for amino groups under such conditions is thus due to both the high reactivity of amino groups and to the stability of the products. At lower pH (i.e., ~6 to 7), where most amino groups are strongly protonated and therefore unreactive, imidazole moieties of histidine residues are still largely unprotonated and are usually the only residues susceptible to acylation. Under such conditions, diethyl pyrocarbonate, which gives particularly stable acyl derivatives, is then relatively specific for histidine residues. By the use of an appropriate acylating agent and an appropriate pH, acylation can thus be quite specific for either amino or imidazole moieties of proteins. Reagents for selectively modifying the major protein functional groups are described below.
MODIFICATION OF AMINO GROUPS (a-NH 2 AND LYSINE) A large number of reagents are used to chemically modify amino groups in proteins. Because amino groups are usually relatively abundant and quite reactive and the products are usually stable, they are frequently employed to introduce various kinds of labels and probes for crosslinking in order to conjugate proteins to each other, to insoluble supports, and to other substances. Most of the reagents react to some extent with both a- and e-amino groups. Table 3 indicates the extent to which they are also likely to affect other functional groups. Reductive Methylation
Reductive methylation (Equation 1) is very specific for amino groups in proteins. Reactions are usually conducted from about pH 6 to 9 depending mainly on the reductant, and while methylation slightly increases the size of the amino groups, it has little effect on their pKa values or the overall charge of most proteins (Means and Feeney, 1968,1995; Means, 1977; Jentoft and Dearborn, 1979). Because it has
29
GARY E. MEANS, HAO Z H A N G , and M I N LE
30
©
-NH3
CH20
NaBH4
pH 8 - 9, or NaBH3CN p H6-7
(P)-NH2CH3 ^—^
CH20
NaBH4 pH 8 - 9 , or^ @ - N H ( C H NaBH3CN
3
)2
(1)
pH6-7
so few effects, reductive methylation is sometimes used to introduce isotopic labels into proteins. 14C- and 3H-labels can be introduced using 14C- or 3H-labeled formaldehyde or 3H-labeled sodium borohydride, and although specific activities are usually lower than might be obtained by radioiodination, physical and biological properties are less likely to be affected and the radiological half-lives are much, much, longer (Rice and Means, 1971; Ascoli and Puett, 1974; Jentoft and Dearborn, 1979; Tack et al., 1980; Means and Feeney, 1995). 13 C- and 2 H- labels can also be introduced with appropriately labeled precursors and may then be characterized by 13C or2H-NMR (Jentoft et al., 1979; Jentoft and Dearborn, 1979, 1983; Zhang and Vogel, 1993). Depending on the conditions, purposes, and other circumstances, sodium borohydride, sodium cyanoborohydride, dimethylamine borane, or pyridine borane can all be used as reducing agents. The last three are usually employed in large excess and the incorporation of formaldehyde is usually very efficient but the reactions are relatively slow. With sodium borohydride, some formaldehyde is also converted into methanol and its incorporation into proteins is therefore usually less efficient. Reactions are usually complete in only a minute or two, however, and the reducing agent is utilized more efficiently, which is particularly important when 3H-labeled borohydride is used for radiolabeling (Tack et al., 1980). With all four reducing agents, monomethylamino groups are formed initially and rapidly converted into dimethylamino groups, which usually predominate except at very low levels of modification. The extent of reaction is usually controlled by the amount of formaldehyde employed and/or by the reaction time. Modification approaching 100% of the amino groups is not unusual and, in many cases, appears to have few or no obvious effects on physical or biological properties. Under the reaction conditions usually employed, no significant side reactions have been described. The particular advantages, disadvantages, and special circumstances involved with each of the reducing agents have been described (Means and Feeney, 1995). Other carbonyl compounds can be used similarly to introduce a wide variety of substituents into proteins. Pyridoxal phosphate, for example, can be incorporated and, at relatively low concentrations, is sometimes used as an affinity label to modify amino groups in or near phosphate and other anion binding sites (Anderson et al., 1966; Rippa et al., 1967; Means and Feeney, 1971b; Dudkin et al., 1975). The UV-visible and fluorescence spectra of the resulting pyridoxamine phosphate moieties are sensitive to their environment and may sometimes be used to characterize the attachment sites or, after fragmentation, to identify sequences originating from those sites.
The Chemistry of Protein Functional Groups
31
Amidination
Methyl and ethyl acetimidate are commercially available and may be used to selectively modify amino groups under very mild conditions (Equation 2). The
©
^NH2
+
DH~8 P
- N H 3 + CH3(^
S~\
*NH2
» (>)-NHC^
+ CH3OH + H + (2)
resulting acetamidine moieties are relatively small, retain a cationic charge, and usually have few effects on protein structure (Hunter and Ludwig, 1962; Ludwig and Hunter, 1967; Browne and Kent, 1975; Makoff and Malcolm, 1981; Inman et al., 1983). Imido esters are also used to introduce many different kinds of substituents into proteins (Jue et al., 1978; Plapp, 1970; Riley and Perham, 1973). Bifunctional imido esters like dimethyl suberimidate, for example, are particularly useful as crosslinking agents (Davies and Stark, 1970). Maleic Anhydride
Maleic anhydride, citraconic (i.e. methylmaleic) anhydride and several related dicarboxylic acid anhydrides are widely used to reversibly modify amino groups in proteins (Equation 3). The reaction is usually done under slightly alkaline
€>»
NH3
~
"
pH8-9
(3)
conditions (pH -8-9) and usually affects only amino groups (Butler et al., 1967, 1969; Atassi and Habeeb, 1972; Shetty and Kinsella, 1980; Aviram et al., 1981). Under those conditions, the introduced substituents are negatively charged and stable. The increased negative charge frequently increases the proteins' solubility, may effect the dissociation of subunits, and sometimes has other effects on protein structure (Shetty and Kinsella, 1980; Aviram et al., 1981). At low pH (i.e., -3.5 or lower), the maleamide moieties are not charged and, more importantly, undergo relatively slow deacylation to regenerate the original amino groups according to Equation 4. This ability to deacylate the modified amino
GARY E. MEANS, HAO ZHANG, and MIN LE
32
groups, to essentially reverse the modification, is very useful of course for a number of purposes. Recovery of an enzyme's catalytic activity or other properties that were lost during the modification, for example, can be a strong indicator of the specificity of the reaction. Like most procedures for their modification, maleylation or citraconylation can be used to protect lysine residues from digestion by trypsin. Deacylation of the purified peptides obtained after such a digestion and subsequent treatment with trypsin, should then effect the cleavage of lysine residues to give the same peptides that would have been obtained without any intervening modification procedure and also provides very important information on the order of those peptides in the overall amino acid sequence. Trinitrobenzenesulfonate
2,4,6-Trinitrobenzenesulfonate reacts with the amino groups of proteins under mild, slightly alkaline conditions, and the products have a strong, easy-to-detect, UV-visible absorption (Equation 5)(Okuyama and Satake, 1960; Goldfarb, 1970, NO2
(V)~NH3 + N O r \ Q / - S Q 3 NO2
NO2 PH8 9
' »
( ^ ~ N H V Q ^
N 0
2
NO2
+ S03= + 2H* (5)
1974; Fields, 1971, 1972). Because they are large and hydrophobic, the introduced substituents are likely to affect protein structure. Because they are easy to detect, their incorporation can usually be followed at low levels and, in some cases, correlated with other changes, for example, in biological activity (Coffee et al., 1971; Hartman et al., 1985). Because the reaction is quite specific for amino groups and easy to follow, it is often used to determine amino groups in proteins and to monitor changes in the number of amino groups resulting from other modifications, procedures, manipulations, and so forth (Fields, 1972). Selective Modifications of a- or e-Amino Groups
Because they are usually more abundant than ct-amino groups, e-amino groups are the principle targets of most amino group modification procedures. Due to differences in the basicity and nucleophilic reactivity of a- and e-amino groups, selective reaction with one or the other is possible, although not usually easy to achieve. With e-amino groups, which are the more basic and the stronger nucleophiles, reactions are strongly favored by high pH. At lower pH values, where they are largely protonated and oc-amino groups are only partially protonated, however, the latter are frequently the more reactive. Selectivity for a- over e-amino groups should be maximal at pH values well below the pKa values of the a-amino group(s),
The Chemistry of Protein Functional Groups
33
but reaction rates can be very slow at unnecessarily low values. Sometimes other factors are also important. Reactions of proteins with nitrous acid, for example, require a low pH and affect mainly a-amino groups, but due to another important ionization (i.e., HONO + H + ^ H 2 0 - NO+), are usually optimal around pH 3.5. Due to their greater abundance, some e-amino groups are also usually affected (Shields et aL, 1959; Wagner et al., 1969; Kurosky and Hofmann, 1972). Other procedures for the selective modification of a-amino groups also usually employ a low pH. Wetzel et al. (1990), for example, have described a procedure to selectively acylate the amino-termini of peptides with iodoacetic anhydride at pH 6. The resulting iodoacetyl derivatives are themselves potent alkylating agents and suitable for conjugation to thiol moieties of other peptides or proteins, on solid supports, and to a variety of other substances. Dixon and coworkers (1972) have described a procedure for the selective transamination of amino-termini. The reaction takes place under mild, slightly acidic conditions (e.g., pH ~5), involves an aldehyde (usually glyoxylate), a heavy metal cation [usually copper (II) or nickel (II)], relatively high concentrations of acetate ion or another weak base, and converts amino-terminal amino acids of peptides and proteins into corresponding a-ketoacyl moieties. The high reactivity of periodate ion with 2-amino alcohols (~1000-times faster than its reaction with 1,2-diols) can be used to effect the specific oxidation of amino-terminal serine and threonine residues of proteins and peptides (Dixon and Fields, 1972). In the absence of sulfhydryl groups, which also react rapidly with periodate, the reaction is usually very specific for those two amino-termini and proceeds quite rapidly at approximately neutral pH. The aldehyde moieties that result are again quite reactive and can be conjugated to various fluorescent labels, biotin, cytotoxic drugs, and so forth (Geoghegan and Stroh, 1992).
MODIFICATION OF IMIDAZOLE GROUPS (HISTIDINE) Diethyl Pyrocarbonate
Diethyl pyrocarbonate, also sometimes called ethoxyformic anhydride, reacts readily with most of the nucleophilic functional groups in proteins at high pH but is relatively specific for histidine residues at low pH (i.e., below pH ~7) (Equation 6) (Melchior and Fahrney, 1970; Miles, 1977). The acylated imidazole moieties have absorption maxima at about 230 to 242 nm (e = 3.0 to 3.6 x 103 M_1cm_1) that can be used to follow the reaction or to determine the number acylimidazole moieties introduced. Subsequent treatment of the modified proteins with hydroxylamine, usually at a pH of about 7, can be used to deacylate the introduced ethoxyformyl histidine residues and frequently restores some of the biological activity lost during the modification.
GARY E. MEANS, HAO ZHANG, and MIN LE
34
©-o ^
+
W*°-C{ C2H5(>-<
< ^
^ ^ N A O C
2
H
5
(6)
^ ^ r ^
%o
+ C2H5OH + coat
Because diethyl pyrocarbonate also reacts readily with water, a large excess is necessary to obtain extensive modification but may lead to the formation of some diacylated histidine residues (i.e., where two equivalents of diethyl pyrocarbonate react with one histidine residue). The resulting diethoxyformylimidazole moieties are unstable and usually break down to still other products, some of which absorb strongly around 240 nm and sometimes contribute to overestimates of histidine acylation (Miles, 1977). Even at low pH, diethyl pyrocarbonate also reacts to some extent with other nucleophilic side chains. Acylation of tyrosine residues is accompanied by decreases in absorption around 278 nm (Ae = 1300 M -1 cm -1 ) and those acyl moieties can again be removed by treatment with hydroxylamine (Bhattacharyya et al., 1992; Lei et al., 1995). Reactions with amino groups, particularly a- and other low-pKa amino groups, also appear to be common under the conditions usually employed but, due to the absence of an accompanying absorption change, they frequently go unnoticed. Failure to obtain complete reactivation after apparently complete deacylation of histidine residues with hydroxylamine, may sometimes reflect undetected acylation of such amino groups (Pasta et al., 1987; Levison et al., 1989; Anderson etal., 1994).
MODIFICATION OF GUANIDINO GROUPS (ARGININE) Butanedione
2,3-Butanedione, cyclohexanedione, phenylglyoxal, and several other vicinal dicarbonyl compounds can be used to selectively modify arginine residues in proteins at approximately pH 7 to 8 and room temperature. Reaction rates and products vary with the dicarbonyl compound, of course, but also with the pH, the nature, and the concentration of the buffer and other solvent components (Takahashi, 1968; Riordan, 1973; Cheung and Fonda, 1979; Epperly and Dekker, 1989). 2,3-Butanedione and phenylglyoxal usually react the most rapidly, are the most widely used, and their reactions are the best characterized. Reactions with 2,3-butanedione are strongly promoted, for example, by low concentrations of borate ion (Riordan, 1973). In its presence, cyclic dihydroxyimidazoline adducts obtained from the addition of butanedione to guanidino groups appear to be converted into more stable cyclic borate diesters (Equation 7). Upon dilution or removal of the borate, those esters break down, the butanedione dissociates, and a significant number of arginine residues are regenerated. The
The Chemistry of Protein Functional Groups
Q-NH-
^
N H
C+
2
\f\IH9
< \ /CH3 C
+
C 0^
X
35
P H7-8
borate buffer CH3
number of modified arginine residues after various times or under different conditions can be determined by amino acid analysis after standard protein hydrolysis, as arginine is not regenerated under those conditions (i.e., in ~6 M HC1 for 18 to 22 hat 110°). In the absence of borate, reactions with butanedione are much slower and the unstable dihydroxyimidazoline adducts appear to be converted into still other products. Arginine cannot be recovered from the latter by dialysis or any other known means, and reactions under those conditions are, therefore, not reversible. Small amounts of the same products may also be formed in the presence of borate and account for the incomplete regeneration of arginine and activity usually observed after the removal of borate and excess butanedione. Phenylglyoxal
Reactions of phenylglyoxal with arginine residues appear to proceed similarly. The initial dihydroxyimidazoline adducts are, again, not particularly stable and usually react further to give somewhat more stable derivatives incorporating two equivalents of phenylglyoxal (Takahashi, 1968) (Equation 8). In some cases,
> (8)
GARY E. MEANS, HAO ZHANG, and MIN LE
36
occasionally with e-amino groups (Takahashi, 1968; Nanduri and Murdak, 1990; Stole and Meister, 1991). Cyclohexanedione and several other dicarbonyl compounds are also used to modify arginine residues but are usually somewhat less effective and not as widely used. Absorption spectra of the adducts obtained from the reactions with p-hydroxyphenylglyoxal (Yamasaki et al., 1980), /7-nitrophenylglyoxal (Yamasaki et al., 1981), and 4-hydroxy-3-nitrophenylglyoxal (Borders et al., 1979) can also be used to detect and/or determine arginine residues in proteins.
MODIFICATION OF CARBOXYL GROUPS (a-COOH, ASPARTATE, AND GLUTAMATE) Water-soluble Carbodiimides and Glycine Ethyl Ester
Although several different kinds of reagents are sometimes used to modify carboxyl groups in proteins, one particular reaction involving a water-soluble carbodiimide and an amine nucleophile is the most widely used (Equation 9). M
(PJ-COO + C y — (^J~\
N 2U
R NH
"
n o C
°~ \ W
^H
(P)~C
2
- ^ ~
K
RH
\
W
R'HN
Several water-soluble carbodiimides and many different amines can be employed, but l-ethyl-3-(3/-N,N-dimethylaminopropyl)carbodiimide (i.e., R = C 2 H 5 -, R' = CH2CH2CH2N(CH3)2) and glycine ethyl ester are by far the most common. The reaction is usually optimal around pH 4.5 to 5 reflecting its dependence on both the protonated form of the intermediate O-acylisourea and the unprotonated amine (Hoare and Koshland, 1967; Horinishi et al., 1968; Carraway and Koshland, 1972). Terminal carboxyl groups and side chain carboxyl groups are both susceptible to reaction. Tyrosine and cysteine residues are also sometimes affected but usually only at high carbodiimide concentrations (Carraway and Koshland, 1968; Carraway and Triplett, 1970). Nucleophiles other than glycine ethyl ester can sometimes be used to advantage, for example, to vary the electrostatic charge and to introduce particular kinds of substituents. Amines with low pKa values (e.g., other amino acid esters or amides, ethylenediamine) are the most effective (Lin and Koshland, 1969; Wang and Young, 1978; Yamadaet al., 1981; Lin et al., 1990). In the absence of an added nucleophile, both water-soluble carbodiimides and dicyclohexylcarbodiimide, which is very insoluble, have been used to form amide linkages between closely spaced amino and carboxyl groups of proteins. The efficiency of these reactions appears to depend very strongly on the proximity of
The Chemistry of Protein Functional Groups
37
the two groups, and its principle use has been to identify amino and carboxyl groups that interact electrostatically in the contact regions between subunits of oligomeric proteins or in the interfacial contact regions of other kinds of protein-protein complexes. In some cases, the presence of 7V-hydroxysuccinimide or /V-hydroxysulfosuccinimide, which allow for the formation of transient succinimidyl ester intermediates, appear to increase the crosslinking efficiency (Yamada et al., 1983; Buechler and Taylor, 1989; Morand et al., 1989; Bartegi et al., 1990).
MODIFICATION OF CARBOXAMIDE GROUPS (ASPARAGINE AND GLUTAMINE) DeamidatJon
The amide groups of asparagine and glutamine residues are chemically similar but those of asparagine are more susceptible to hydrolysis (i.e., deamidation). That hydrolysis is thought to proceed via an intramolecular nucleophilic attack of the peptide backbone nitrogen on the side chain carbonyl group of the asparagine residue. A resulting succinimide intermediate then breaks down to give both aspartate and isoaspartate residues (Equation 10), usually in a ratio of about 1:3 ^COO" H2O
A^ |J J Asn
N T H
v
Z
H20
(10)
NH3
IsoAsp
(Meinwaldetal., 1986; Geiger and Clarke, 1987; Lura and Schirch, 1988). Reaction rates are strongly dependent on the residue to which the acyl group of asparagine is attached—Asn-Gly sequences being the fastest, asparagine and large branched amino acid sequences the slowest—and on the conformation and/or segmental mobility of such sequences (Stephenson and Clark, 1989; Tyler-Cross and Schirch, 1991; Wright, 1991; DiDonatoet al., 1993). The reaction is spontaneous at neutral or, preferably, slightly alkaline pH but is enhanced at high ionic strength and by certain buffers. It is thought to be a normal
38
GARY E. MEANS, HAO ZHANG, and MIN LE
part of protein aging and appears to account for some of the heterogeneity usually observed in the case of long-lived proteins and for that which usually develops during the storage of highly purified proteins (Johnson et al., 1989; Artigues et al., 1993; Dorman et al., 1993). To purposely bring about such deamidations, a protein is usually incubated at a slightly elevated temperature (i.e., -37°) at a pH of about 8 to 9 in relatively high concentrations of ammonia and/or phosphate buffer (Tyler-Cross and Schirch, 1991; Tuong et al., 1992; DiDonato et al., 1993; Sharma et al., 1993; Garza-Ramos et al., 1994). Glutamine residues and peptide backbone amides are usually stable under these conditions and are not affected.
MODIFICATION OF SULFHYDRYL GROUPS (CYSTEINE) Many reagents are used to modify sulfhydryl groups in proteins. Some are used primarily to determine whether sulfhydryl groups are required for biological activity, to identify those involved, to determine the number present or the number required for activity, or to obtain stable derivatives that can be identified during sequence determinations or other kinds of structural studies. Because sulfhydryl groups are usually so very reactive, they are used to introduce many different kinds of labels into proteins, and they are also frequently the targets of crosslinking agents. Some of the most useful sulfhydryl reagents are described below and in Table 3. N-Ethylmaleimide
N-Ethylmaleimide, a stable, water-soluble solid, is commonly used to identify proteins that require sulfhydryl groups for biological activity. It reacts rapidly and usually specifically with sulfhydryl groups at approximately neutral pH (Equation 11). Although it can also react with other nucleophilic groups as mentioned
previously (Brewer and Riehm, 1967), those reactions are usually several orders of magnitude slower. To minimize any possibility of the latter, reactions are usually conducted around pH 6 to 7, where the rates of its reactions with thiols and with those other groups are all suboptimal (Bednar, 1990). The N-ethylsuccinimidyl moieties, which may take either of two chiral forms, are relatively stable, large, and somewhat apolar. Ethylamine and S-(2-succinyl)cysteine produced upon standard acid hydrolysis can usually be detected by amino acid analysis. 5,5'-Dithio(2-nitrobenzoic acid), which is discussed below, and other reagents commonly used to determine thiol groups can be used to determine the extent of and sometimes to follow such reactions.
The Chemistry of Protein Functional Groups
39
Because they are relatively easy to prepare and also react very similarly, many simple TV-alky 1- and N-arylmaleimides have been used to modify proteins. Longchain, more hydrophobic, N-alkylmaleimides, for example, frequently react even faster than N-ethylmaleimide with proteins (Anderson et al., 1970; Anderson and Vasini, 1970; Wang et al., 1992; Kalgutkar and Marnett, 1994). Other N-substituted maleimides have been designed to introduce fluorescent and other kinds of spectroscopic labels or probes into proteins (Weltman et al., 1973; Haugland, 1989; Tyagi, 1991; Bhattacharjee and Bhaduri, 1992; Palmer et al., 1993; Turina et al., 1993; Pan and Cherry, 1995). Some have been employed as homobifunctional and heterobifunctional crosslinking agents (Moore and Ward, 1956; Wong, 1991; Pierce Chemical Co., 1994a). TV-Substituted maleimides with one or more charged groups are relatively membrane-impermeable and have been used to modify and detect surface sulfhydryl groups of biological membranes (Abbott and Schachter, 1976; May, 1989). Af-Ethylmaleimide, on the other hand, is quite membrane-permeable. Many TV-substituted maleimides are commercially available. Iodoacetamide, bromoacetamide, iodoacetic, and bromoacetic acids are also used to modify sulfhydryl groups under conditions similar to those used with 7V-ethylmaleimide and usually give similar results. More detailed information on their reactions can be found elsewhere (Means and Feeney, 1971; Glazer and Delange, 1975;Lundblad, 1991, 1995). Methyl Methanethiosulfonate
Methyl methanethiosulfonate is not as widely used as N-ethylmaleimide but is commercially available and also reacts very rapidly and specifically with sulfhydryl groups at or around neutral pH (Equation 12). Because the introduced substituent ( ? ) — S H + CH3SO2SCH3
•*
( V ) - S S C H 3 + CH3SO2H
(12)
is relatively small and has no electrostatic charge, it usually has little effect on protein structure. Perhaps even more importantly, it can usually be removed under mild conditions by subsequent treatment with excess (3-mercaptoethanol or another thiol (Smith and Kenyon, 1974; Kenyon and Bruice, 1977). Other alkyl methanethiosulfonates are not commercially available but can usually be prepared without great difficulty and used to introduce larger and more hydrophobic substituents, charged groups, and other varied substituents into proteins (Akabas et al., 1992; Stauffer and Karlin, 1994; Pathak et al., 1995). Dithio(2-nitrobenzoate)
5,5/-Dithio(2-nitrobenzoate) is widely used both to modify and to determine sulfhydryl groups in proteins. With solvent-accessible sulfhydryl groups, the reaction (Equation 13) is usually quite rapid at neutral to slightly alkaline pH (-7.0 to
GARY E. MEANS, HAO ZHANG, and MIN LE
40
COO
~OOC.
£00"
©- s s -<0^ N 0 2 (13)
~ooc N
DTNB
°2~~\LJ/
s
"
TNB
8.5) and no other functional groups are affected (Ellman, 1959; Riddles et al., 1979). Recently deionized urea or guanidine hydrochloride can usually be employed to increase the reactivity of inaccessible and other slow-reacting or unreactive sulfhydryl groups. The presence of low molecular weight disulfides (i.e., RSSR), where R is a small alkyl or substituted alkyl group, can also usually increase reaction rates. Low concentrations of cystamine, for example, increase the reactivity of Cys-34 in bovine serum albumin by more than 100-fold, reflecting its location in a narrow anionic crevice (Wilson et al., 1980). 5-Thio-2-nitrobenzoate dianion produced by the reaction of 5,5'-dithio(2-nitrobenzoate) with sulfhydryl groups has a strong absorption at ~412 nm (e = 13,600 M"1 cm -1 ) that can be used to follow reactions and determine the number of reactive sulfhydryl groups (Ellman, 1959; Riddles etal., 1979). Below about pH 7, reactions are frequently too slow to be useful, whereas above about pH 8.5, hydrolysis and disproportionate (Equations 14, 15 & 16) produce significant amounts of 5-thio2ArSSAr + 20H" 2[ArS0H]
*> 2[ArSOH] + 2ArS" •
2ArSSAr + 20H"
ArS02* + ArS "+ 2H + ArS02~ + 3ArS" + 2H+
(14) (15) (16)
2-nitrobenzoate dianion, which can be mistaken for slow-reacting sulfhydryl groups and interfere with their detection when they are present (Riddles et al., 1979). Metal ion-catalyzed reoxidation of 5-thio-2-nitrobenzoate by oxygen (Equation 17) is also prominent at high pH and usually interferes with sulfhydryl group 2ArSH + 1/2 0 2 - * *
^ ArSSAr + H2O
(17)
determinations under those conditions (e.g., above -8.5). However, because metal ions are involved in the latter, reoxidation can usually be suppressed by the presence of EDTA or by somehow excluding o\ygen (Riddles et al., 1979).
The Chemistry of Protein Functional Groups
41
The 5-thio-2-nitrobenzoate substituent introduced upon reaction with 5,5'dithio(2-nitrobenzoate) is relatively large, hydrophobic, and anionic, and it often affects a protein's structure. It has an absorption spectrum similar to that of the reagent, but less intense (i.e., Xmax= 323 nm, e = 2,500 M _1 cm _1 ), that can usually be used to determine the number of substituents introduced (Colman, 1969; Riddles et al., 1979). (3-Mercaptoethanol and other simple thiols react readily with the incorporated substituents and, in large excess, can be used to regenerate the original protein sulfhydryl groups (Equation 18) and sometimes the original biological ( ? ) - S S A r + RSH ~^r** Q P ) - S H + RSSR + ArS" (excess)
(18)
activity (Kastenschmidt et al., 1968; Ploux et al., 1995). In proteins with closely spaced, so-called vicinal, sulfhydryl groups, the same reaction sometimes results in the formation of intramolecular disulfide bonds (Equation 19). In such cases, only one equivalent of 5,5/-dithio(2-nitrobenzoate) is
v^H
r\^
(EC—©a---*
(19)
consumed but two equivalents of 5-thio-2-nitrobenzoate dianion are released. None is incorporated into the protein (Wassarman and Major, 1969; Lewis et al., 1993). The stoichiometry, based on the amount of 5-thio-2-nitrobenzoate released, is exactly the same as if two sulfhydryl groups had reacted independently with 5,5'-dithio(2-nitrobenzoate), but there is no approximate 323-nm chromophore incorporated into the protein. Cyanide ion can also be used to remove 5-thio-2-nitrobenzoate moieties and convert the original thiol groups into thiocyanate derivatives (Equation 20). Some proteins are reactivated or partially reactivated by such (V)-SSAr + CN"
( V ) - S C N + ArS~
(20)
treatment due, presumably, to the reduced size of the substituent and/or its lack of a charge (Huynh and Snell, 1986; Cunningham et al., 1990). 2-Nitro-5-thiocyanobenzoate, a commercially available analog of 5,5'-dithio(2nitrobenzoate), reacts directly with sulfhydryl groups to give the same thiocyanate derivatives and one equivalent of 5-thio-2-nitrobenzoate dianion (Kindman and Jencks, 1981; Altamirano et al., 1992). The latter's formation is not stoichiometric, however, as significant amounts of a 5-thio-2-nitrobenzoate adduct are also usually obtained, depending largely on the accessibility and environment of the particular sulfhydryl group(s) (Equations 21a & 21b).
GARY E. MEANS, HAO ZHANG, and MIN LE
42
"OOCv 'OOO
Q - S H
( P ) - S C N + 02N-((Q)-S" (21a)
+ 02N-<(Q)-SCN ^
i^K
/—< /C0 °" N02 + cN
®~ ss "^Q)^
"
(21b) Dipyridyl Disulfide In addition to 5,5'-dithio(2-nitrobenzoate), many other disulfide compounds can also be used to modify and/or detect sulfhydryl groups in proteins and some of them have important advantages and/or special uses. Both 2,2'-dipyridyl disulfide and 4,4/-dipyridyl disulfide, for example, react very similarly with sulfhydryl groups (Equation 22) and are probably the most important (Grassetti and Murray, 1967).
0_SH +/ ^ s s ^ r \ ^ X = /
N
—
©-SSHQ •
H^S (22)
As with 5,5'-dithio(2-nitrobenzoate), both reactions appear to be absolutely specific for sulfhydryl groups and give readily detectable chromophores (e342 = 7,060 M~' cm \ £324 = 19,800 M 'cm \ respectively). Both are weak bases (pKj = 2.5 and 5.1, respectively (Grimshaw et al., 1979; Brocklehurst, 1982)), and because they become partially protonated, their reactivities are enhanced at low pH. Below pH 7, where reactions with 5,5/-dithio(2-nitrobenzoate) are very slow, for example, both 2,2'- and 4,4'-dipyridyl disulfide still react rapidly with sulfhydryl groups and, due to the low concentrations of hydroxide ion, there is virtually no hydrolysis and disproportionation (Equations 10 & 11) or metal ion-catalyzed reoxidation (Equation 13) (Le and Means, 1995). Because 4-thiopyridone has an extinction coefficient about 50% greater than that of 5-thio-2-nitrobenzoate dianion, 4,4/-dipyridyl disulfide is also more useful than 5,5/-dithio(2-nitrobenzoate) for determining sulfhydryl groups at very low concentrations. Mixed disulfides of 2-thiopyridine with a wide variety of sulfhydryl compounds can be prepared, some of which are commercially available and have become quite important. They also react rapidly and very specifically with the sulfhydryl groups
a (/
NH
N
NH-(CH2)6-Hr/\,
The Chemistry of Protein Functional Groups
43
N3
/°-
N^^SS-(CH2)2-C^
0
III
y °
X
NT^SS
<poo'
0
C
CH2-CH-Nhh-C-CH2NH
0
IV
of proteins and can be used to introduce many different kinds of labels into proteins (e.g., I and II) or as homo- and heterobifunctional crosslinking agents (e.g., Ill & IV) (Stuchbury et al., 1975; Carlsson et al., 1978; Chong and Hodges, 1982; Pierce Chem. Co., 1994b). Selective Reactions with Vicinal Sulfhydryl Groups
Disulfides like 5,5/-dithio(2-nitrobenzoate), 2,2'- and 4,4/-dipyridyl disulfide, and other mild oxidants (e.g., methylmethanethiosulfonate) can sometimes be used to form disulfide bonds between closely spaced sulfhydryl groups in proteins (For example, see Equation 14). These so-called vicinal sulfhydryl groups have been observed in a number of proteins and are of special interest when they appear to have an important functional role (Zahler and Cleland, 1968; Frost and Lane, 1985; Berleth et al., 1992; Stancato et al., 1993). While vicinal sulfhydryl groups are sometimes detected by their reaction with 5,5'-dithio(2-nitrobenzoate), methylmethanethiosulfonate, or some other oxidant, they are usually detected more easily by their reactions with arsenite ion, phenylarsine oxide (i.e., C6H5AsO) or one of several other organoarsenic (III) compounds (Equation 23).
®C+As0f ~^ ® 0 " S - ° H + OH'
(23)
Arsenite Ion
Arsenite ion (AsOp and other bivalent arsenic (III) compounds form stable complexes with dithiothreitol and many other simple dithiols (Webb, 1966; Lopez et al., 1990). Stabilities of the complexes vary with the arsenic (III) and dithiol compounds, particularly the size of the resulting dithioarsenite rings, the pH, and various other factors. Dissociation constants for the complex of arsenite with dithiothreitol or several other relatively simple dithiols are usually below 1 juM at neutral pH. Closely spaced sulfhydryl groups in proteins are thought to react similarly and the resulting complexes are often of similar stability (Frost and Lane, 1985; Lopez et al., 1990; Bjork and Ylinenjarvi, 1992; Simons and Pratt, 1995).
GARY E. MEANS, HAO ZHANG, and MIN LE
44 Phenylarsine
Oxide
Phenylarsine oxide (or phenylarsenoxide) and other organoarsenic (III) compounds are larger, are usually more lipophilic, and form complexes with proteins that are generally more stable than those with arsenite (Stevenson et al., 1978; Brown et al., 1987; Lopez et al., 1990; Li and Pickart, 1995). Factors affecting the stability of such complexes are not thoroughly understood but clearly reflect, among other things, differences in sulfhydryl group proximity and flexibility. In cases where biological activity is affected, dithiothreitol and other simple dithiols will usually reverse those effects whereas monothiols usually will not.
MODIFICATION OF DISULFIDE BONDS (CYSTINE) Reduction by Dithiothreitol and Other Thiols
Simple monothiols like (3-mercaptoethanol and glutathione react with the disulfide bonds of proteins, but the reactions are reversible and strongly favor the native form of most proteins (Seidleret al., 1993; Zapunet al., 1993) (Equation 24). Such
RSH ( p Y | ^ S
+ RSH « — » (excess)
( P \ S H
W X
Wx SH
__ ( P J
(24)
^ RSH+ ) Urea, etc
+ RSSR
R P
I
. SH
+ RSSR
reactions can usually be driven to completion, however, with a large excess of thiol under conditions where native structures are altered or destabilized, for example, by urea or guanidine hydrochloride (Lin and Kim, 1989). Rare disulfide bonds in a few native proteins that undergo significant reduction without such destabilization are, presumably, already unstable (Mitchinson and Wells, 1989; Kirley, 1990). Dithiothreitol, dithioerythritol, and some other dithiols are stronger reducing agents than monothiols and are usually more effective, but some kind of destabilizing conditions are still usually necessary to achieve significant reduction (Singh and Whitesides, 1994). Mixed disulfide bonds between proteins and small thiols like glutathione or cysteine, disulfide bonds between two different peptide chains, and those in proteins that have undergone partial proteolysis are usually more susceptible to reduction (Toniyamaetal., 1990; Gravina and Mieyal, 1993; Yamashitaetal., 1995). Disulfide bonds introduced by site-directed mutagenesis and in proteins where the structure has been destabilized by mutagenesis at some other location are also, usually, much
The Chemistry of Protein Functional Groups
45
more susceptible to reduction by both mono- and dithiols (Wetzel et al., 1988; Mitchinson and Wells, 1989; Goldenberg et al., 1993).
MODIFICATION OF THIOETHER GROUPS (METHIONINE) Hydrogen peroxide (Stauffer and Etson, 1969), chloramine T (Shechter et al., 1975), and several other oxidizing agents can be used to convert methionine residues of proteins into sulfoxide derivatives (Equation 25). Sulfhydryl groups, if ®-]}CH3 0
@ - S C H 3 + [0]
(25)
present, usually react even faster and tryptophan residues are also sometimes affected. Because they are relatively hydrophobic, methionine side chains are not always very solvent accessible and therefore not very reactive. Only the most accessible methionine residues are readily modified. Hydrogen Peroxide
At low pH (i.e., < 3), hydrogen peroxide usually reacts fairly specifically with methionine residues in proteins with no sulfhydryl groups. The reaction of ubiquitin with excess H 2 0 2 under such conditions, for example, affected only Met-1 but gave two Met-1 monosulfoxide stereoisomers that differed considerably in their conformational stability (Bamezai et al., 1990). Chloramine T
At approximately neutral pH, chloramine T (i.e., the sodium salt of N-ch\oro-ptoluenesulfonamide) reacts fairly specifically with methionine side chains in proteins that do not have sulfhydryl groups. Tryptophan residues are not usually affected under those conditions but are very susceptible at lower pH. As compared to hydrogen peroxide, relatively little chloramine T is usually required for such oxidations (Lawrence and Loskutoff, 1986; Amiconi et al., 1989; Miles and Smith, 1993). Methionine sulfoxide residues in proteins can be reduced by simple sulfhydryl compounds to regenerate methionine and, in some cases, to restore some of the lost biological activity (Equation 26). High concentrations of (3-mercaptoethanol or a @ - S C H 3 • 2RSH
5%HAC
» 0 - S C H 3 • RSSR + H20
(26)
slightly stronger thiol reducing agent, dithiothreitol, mercaptoacetic acid, or TVmethyl mercaptoacetamide, are usually employed and are most effective under
GARY E. MEANS, HAO ZHANG, and MIN LE
46
acidic conditions (e.g., - 5 % acetic acid; Houghton and Li, 1983). Disulfide bonds, if present, are cleaved under the same conditions but can frequently be restored by subsequent refolding and reoxidation.
MODIFICATION OF INDOLE GROUPS (TRYPTOPHAN) N-Bromosuccinimide
N-Bromosuccinimide is widely used to modify tryptophan residues in proteins, although some evidence suggests N-chlorosuccinimide may actually be superior (Spande et al., 1966; Spande and Witkop, 1967). In both cases, a low pH (i.e., pH ~4 to 5) is used and the reaction (Equation 27) usually effects decreases in
absorption around 280 nm (Ae = 4,500 M^crrT1) that can be used to determine the extent of reaction (Shechter et al., 1976; Ohnishi et al., 1980). With low molecular weight tryptophan derivatives, 1.5 to 2 equivalents of /V-bromosuccinimide are usually consumed for each tryptophan residue, whereas about two to six equivalents are usually necessary with proteins. The oxindole products of the reaction are not very stable and frequently undergo some peptide bond scission and/or react with additional amounts of yV-bromo- or N-chlorosuccinimide depending on the conditions. Sulfhydryl groups, if present, are usually oxidized as are some methionine residues. Tyrosine residues can also be modified, especially when an excess of either reagent is used. Reactions of iV-bromosuccinimide with simple tryptophan derivatives are, however, about 1,000-times faster than those with comparable tyrosine derivatives under the same conditions (Ohnishi et al., 1980).
MODIFICATION OF PHENOLIC GROUPS (TYROSINE) Tyrosine side chains of proteins are subject to many kinds of electrophilic substitution and may also be readily acylated. Iodination was one of the earliest procedures used to modify proteins and affects mainly tyrosine residues, as shown in Equation 28. It also usually affects cysteine, if present, and histidine residues. It
(P><S>--0H + * ^-^
(JyiO)-0"
\*~s
^—'
x^y
(ICI)
n m
>—f
• 'if* "+ (28) (Ci )
The Chemistry of Protein Functional Groups
47
also sometimes affects tryptophan and methionine residues (Koshland et al., 1963) and is particularly important as the most widely used method to radiolabel proteins. lodination
Two isotopes of iodine, 125I and 131I, are commonly used to radioiodinate proteins. In addition to its familiarity and convenience, the principle advantage of radioiodination appears to be the very high specific activities that can be obtained (i.e., -2,125 Ci/mmol for each incorporated 125I and -6,500 Ci/mmol for each incorporated 131I) (Wilbur, 1992). The tendency of iodine to react with several different kinds of functional groups, particularly its very fast reaction with sulfhydryl groups, are among its disadvantages. Because both monoiodotyrosine and diiodotyrosine residues are sometimes obtained and are both larger and appreciably more acidic than tyrosine, their formation also sometimes affects protein structure (Covelli and Wolff, 1966). The relatively short half-lives of 125I and 131I (-60 and 8.1 days, respectively) are also often a disadvantage necessitating the use of decay corrections and precluding some relatively long-term applications and longterm storage of radioiodinated proteins. Elemental iodine is relatively insoluble in water but soluble in aqueous sodium or potassium iodide to give red-brown solutions of triiodide ion (i.e., I2 + I~ ^ Ip, that are convenient for the iodination of proteins. Under slightly alkaline conditions, reactions with the small amount of iodine in equilibrium with triiodide ion are usually fast and may sometimes be followed by monitoring decreases in triiodide concentration at -355 nm (Cunningham and Nuenke, 1961). The products, mono- and diiodotyrosine, have increased absorbencies in the near UV and their approximate amounts can usually be determined, after removal of triiodide, from their absorptions according to the procedure of Edelhoch (Edelhoch, 1962). When radioactive iodine is used, procedures designed to achieve more efficient incorporation and the highest possible specific activity are usually preferred. Most of those procedures involve the formation of IC1 from radioactive iodide ion and an electrophilic chlorine donor, like 7V-chlorosuccinimide or chloramine T (Greenwood et al., 1963), that serves as a source of Cl+ (i.e., *I" + Cl+ -> *IC1). The IC1 formed then reacts with proteins to effect their iodination without any isotope dilution (Greenwood et al., 1963; Lawrence and Loskutoff, 1986). Because chloramine T and related /V-chloro compounds are potent oxidants and have their own effects on proteins (i.e., see earlier sections on Thioether and Indole Groups), they are usually employed in very low amount. Several insoluble jY-chloro compounds, 1,3,4,6-tetrachlorodiphenylglycouril (Fraker and Speck, 1978), and derivatized polystyrene bead products containing chloramine T-like moieties (Markwell, 1982), are also used similarly to effect iodination and, due to their lesser interactions with soluble proteins, usually have fewer direct effects on them. Stopping reactions and separating modified proteins from the insoluble reagents by decantation is also very convenient.
48
GARY E. MEANS, HAO ZHANG, and MIN LE
Iodination of proteins with lactoperoxidase, hydrogen peroxide, and iodide ion proceeds under mild conditions and, like other iodination procedures, results in the formation of both mono- and diiodotyrosine as well as mono- and diiodohistidine residues. In contrast to reactions with I2 and IC1, however, those with lactoperoxidase involve the formation of a Michaelis-Menten complex between lactoperoxidase and the three reactants, the protein being modified, T, and H 2 0 2 ; the reactions with lactoperoxidase are therefore limited to accessible or surface tyrosine and histidine residues (Morrison and Bayse, 1970; Huber et al., 1989). Because many tyrosine and histidine residues subject to iodination by other methods are not accessible to lactoperoxidase, smaller amounts of iodine are usually incorporated. Because those affected are at the surface, however, there is also usually less effect on a protein's structure. Because only accessible or surface residues are involved, reactions with lactoperoxidase are also sometimes used as part of a scheme to identify exposed tyrosine and histidine residues of proteins or to identify the surface components of various macromolecular assemblies (Wower et al., 1983; Illy et al., 1991). As with the previously mentioned iodination methods, immobilized forms of lactoperoxidase are available and appear to offer some important advantages as compared to the soluble enzyme, for example, again making it easy to stop reactions and to separate modified proteins from lactoperoxidase (David, 1972). Tetranitromethane
Tetranitromethane is one of the most widely used reagents to modify tyrosine residues in proteins. The reaction again proceeds optimally under alkaline conditions, converts tyrosine into 3-nitrotyrosine residues (Equation 29), and usually N02
0 - ^ - O H * C(N02)4 ^
^ ©-^-OH
<29)
+ C(N02)3* + H *
does little else (Sokolovsky et al., 1966; Riordan and Vallee, 1972b). Sulfhydryl groups, if present, are usually oxidized but histidine residues are not usually affected under the conditions typically employed (Sokolovsky et al., 1969). In some cases, especially at high tetranitromethane concentrations, tryptophan residues also appear to undergo nitration (Morihara and Nagami, 1969; Sokolovsky et al., 1970; Prieels et al., 1975), and crosslinking, probably due to the formation of 3,3'-dityrosine residues, is also sometimes observed (Cuatrecasas et al., 1969; Boesel and Carpenter, 1970; Aeschbach et al., 1976). Nitration effects an increase in the size and the acidity of tyrosine residues but, because the solvent accessible tyrosine residues are affected most, usually without much effect on protein structure. With pKa values around 6.8 to 7.0, most 3-nitro-
The Chemistry of Protein Functional Groups
49
tyrosine residues are largely ionized under physiological conditions in contrast to tyrosine residues, which usually are not. Under acidic conditions, 3-nitrotyrosine absorbs maximally at about 360 nm (e = 2,800 M -1 cm _1 ), the 3-nitrotyrosinate anion obtained under alkaline conditions absorbs maximally at about 428 nm (e = 4,100-4,200 M^cm" 1 ), and there is an isosbestic point at about 381 nm (e = 2,200 M -1 cm _1 ). Absorption measurements at either 428 nm under alkaline conditions or 381 nm at any pH are usually employed to determine the extent of nitration (Riordan and Vallee, 1972). Urea appears to effect a slight red-shift in the spectrum of 3-nitrotyrosine and increases the absorption of the 3-nitrotyrosinate anion at 428 nm (e428 = 4,500 M^cm" 1 in 8 M urea (Malan and Edelhoch, 1970)). Higher extinction coefficients for the 3-nitrotyrosinate anion have also been observed in some proteins and in aqueous dioxane (e428 = 4,800 M^cm - 1 in 46% dioxane; Myers and Glazer, 1971). 3-Nitrotyrosine is not destroyed under the conditions usually used for acid hydrolysis of proteins and can usually be determined by amino acid analysis. N-Acetyl imidazole
jY-Acetylimidazole can be used to acylate exposed tyrosine residues in proteins, as shown in Equation 30, and is widely employed to determine those required for
(30)
biological activity (Riordan et al., 1965; Riordan and Vallee, 1972a). Decreases in absorption around 280 nm accompany and can be used both to follow reactions as well as to determine the number of acylated tyrosine residues. An e278 value of 1160 M_1cm_1 (Riordan et al., 1965) for solvent-exposed tyrosine residues is usually employed for the latter but appears to be low for tyrosine residues in apolar environments and, in such cases, may lead to overestimates of tyrosine acetylation (Myers and Glazer, 1971; Kay et al., 1974). Amino groups also usually react to some extent with 7V-acetylimidazole under the same conditions, but no other common side chain group gives a stable product. The acetylation of amino groups is not accompanied by an absorption change and cannot be reversed by treatment with hydroxylamine. O-Acetylated tyrosine residues, however, are usually susceptible to hydroxylamine and treatment with such usually regenerates both the lost absorption and at least some of the original biological activity (Riordan et al., 1965; Myers and Glazer, 1971; Riordan and Vallee, 1972; Kay etal., 1974; Yuetal., 1991; Hawes et al., 1995; Mueller et al., 1995).
50
GARY E. MEANS, HAO ZHANG, and MIN LE
REFERENCES Abbott, R.E. and Schachter, D. (1976). Impermeant maleimides. Oriented probes of erythrocyte membrane proteins. J. Biol. Chem. 251, 7176-7183. Aeschbach, R., Amado, R., and Neukom, H. (1976). Formation of dityrosine cross-links in proteins by oxidation of tyrosine residues. Biochim. Biophys. Acta 439, 292-301. Akabas, M.H., Stauffer, D.A., and Karlin, A. (1992). Acetylcholine receptor channel structure probed in cysteine-substitution mutants. Science 258, 307-310. Altamirano, M.M., Plumbridge, J.A., and Calcagno, M.L. (1992). Identification of two cysteine residues forming a pair of vicinal thiols in glucosamine-6-phosphate deaminase from E. coli and a study of their functional role by site-directed mutagenesis. Biochemistry 31, 1153-1158. Amiconi, G., Ascoli, F., Barra, D., Bertollini, A., Matarese, R.M., Verzili, D., and Brunori, M. (1989). Selective oxidation of methionine P(55)D6 at the a l p l interface in hemoglobin completely destabilizes the T-state. J. Biol. Chem. 264, 17745-17749. Anderson, B.M., Kim, S. J., and Wang, C.-N. (1970). Inactivation of rabbit muscle L-a-glycerophosphate dehydrogenase by N-alkylmaleimides. Arch. Biochem. Biophys. 138, 66-72. Anderson, B.M. and Vasini, E.C. (1970). Nonpolar effects in reactions of the sulfhydryl group of papain. Biochemistry 9, 3348-3352. Anderson, B.M., Anderson, CD., and Churchich, J.E. (1966). Inhibition of glutamic dehydrogenase by pyridoxal 5'-phosphate. Biochemistry 5, 2893-2900. Anderson, P.M., Korte, J.J., and Holcomb, T.A. (1994). Reaction of the N-terminal methionine residues in cyanase with diethylpyrocarbonate. Biochemistry 33, 14121-14125. Artigues, A., Farrant, H., and Schirch, V. (1993). Cytosolic serine hydroxymethyltransfera.se. Deamidation of asparaginyl residues and degradation in Xenopus laevis oocytes. J. Biol. Chem. 268, 13784-13790. Ascoli, M. and Puett, D. (1974). Tritium labeling of luteinizing hormone by reductive methylation. Biochim. Biophys. Acta 371, 203-210. Atassi, M.Z. and Habeeb, A.F.S.A. (1972). Reactions of proteins with citraconic anhydride. Methods Enzymol. 25, 546-553. Aviram, I., Myer, Y.P., and Schrejter, A. (1981). Stepwise modification of the electrostatic charge of cytochrome c. Effects on protein conformation and oxidation-reduction properties. J. Biol. Chem. 256,5540-5544. Bamezai, S., Banez, M.A.T., and Breslow, E. (1990). Structural and functional changes associated with modification of the ubiquitin methionine. Biochemistry 29, 5389-5396. Bartegi, A., Fattoum, A., and Kassab, R. (1990). Cross-linking of smooth muscle caldesmon to the NH2-terminaI region of skeletal F-actin. J. Biol. Chem. 265, 2231-2237. Bednar, R.A. (1990). Reactivity and pH dependence of thiol conjugation to N-ethylmaleimide: Detection of a conformational change in chalcone isomerase. Biochemistry 29, 3684-3690. Berleth, E.S., Kasperek, E.M., Grill, S.P, Braunscheidel, J.A., Graziani, L.A., and Pickart, CM. (1992). Inhibition of ubiquitin-protein ligase (E3) by mono- and bifunctional phenylarsenoxides. Evidence for essential vicinal thiols and a proximal nucleophile. J. Biol. Chem. 267, 16403-16411. Bhattacharjee, H. and Bhaduri, A. (1992). Distinct functional roles of two active site thiols in UDP glucose 4-epimerase from Kluyvewmyces fragilis. J. Biol. Chem. 267, 11714-11720. Bhattacharyya, D.K., Bandyopadhyay, U., and Banerjee, R.K. (1992). Chemical and kinetic evidence for an essential histidine in horseradish peroxidase for iodide oxidation. J. Biol. Chem. 267, 9800-9804. Bjork, I. and Ylinenjarvi, K. (1992). Different roles of the two disulfide bonds of the cysteine proteinase inhibitor, chicken cystatin, for the conformation of the active protein. Biochemistry 31, 85978602. Boesel, R.W. and Carpenter, FH. (1970). Crosslinking during the nitration of bovine insulin with tetranitromethane. Biochem. Biophys. Res. Commun. 38, 678-682.
The Chemistry of Protein Functional Groups
51
Borders, C.L. and Riordan, J.F. (1975). An essential arginyl residue at the nucleotide binding site of creatine kinase. Biochemistry 14, 4699-4704. Borders, C.L., Pearson, L.J., McLaughlin, A.E., Gustafson, M.E., Vasiloff, J., An, FY, and Morgan, D.J. (1979). 4-Hydroxy-3-nitrophenylglyoxal. A chromophoric reagent for arginyl residues in proteins. Biochim. Biophys. Acta 568, 491-495. Brewer, C.F. and Riehm, J.R (1967). Evidence for possible nonspecific reactions between N-ethylmaleimide and proteins. Analyt. Biochem. 18, 248-255. Brocklehurst, K. (1982). Two-protonic-state electrophiles as probes of enzyme mechanisms. Methods Enzymol. 87, 426-469. Browne, B.H. and Kent, D.T. (1975). Formation of nonamidine products in the chemical modification of horse liver alcohol dehydrogenase with imidoesters. Biochem. Biophys. Res. Commun. 67, 133-139. Brown, S.B., Turner, R.J., Roche, R.S., and Stevenson, K.J. (1987). Spectroscopic characterization of thioredoxin covalently modified with monofunctional organoarsenical reagents. Biochemistry 26, 863-871. Buechler, J.A. and Taylor, S.S. (1989). Dicyclohexylcarbodiimide cross-links two conserved residues, Asp-184 and Lys-72, at the active site of the catalytic subunit of cAMP-dependent protein kinase. Biochemistry 28, 2065-2070. Butler, P.J.G., Harris, J.I., Hartley, B.S., and Leberman, R. (1967). Reversible blocking of peptide amino groups by maleic anhydride. Biochem. J. 103, 78p-79p. Butler, P.J.G., Harris, J.I., Hartley, B.S., and Leberman, R. (1969). The use of maleic anhydride for the reversible blocking of amino groups in polypeptide chains. Biochem. J. 112, 679-689. Carlsson, J., Drevin, H., and Axen, R. (1978). Protein thiolation and reversible protein-protein conjugation. N-Succinimidyl 3-(2-pyridyldithio)propionate, a new heterobifunctional reagent. Biochem. J. 173,723-737. Carraway, K.L. and Koshland, D.E., Jr. (1968). Reaction of tyrosine residues in proteins with carbodiimide reagents. Biochim. Biophys. Acta 160, 274-276. Carraway, K.L. and Triplett, R.B. (1970). Reaction of carbodiimides with protein sulfhydryl groups. Biochim. Biophys. Acta 200, 564-566. Carraway, K.L. and Koshland, D.E. (1972). Carbodiimide modification of proteins. Methods Enzymol. 25,616-623. Cheung, S. and Fonda, M.L. (1979). Reaction of phenylglyoxal with arginine. The effects of buffers and pH. Biochem. Biophys. Res. Commun. 90, 940-947. Chong, P.C.S. and Hodges, R.S. (1982). Photochemical cross-linking between rabbit skeletal troponin subunits. Troponin I-troponin T interactions. J. Biol. Chem. 257, 9152-9160. Coffee, C.J., Bradshaw, R.A., Goldin, B.R., and Frieden, C. (1971). Identification of the sites of modification of bovine liver glutamate dehydrogenase reacted with trinitrobenzenesulfonate. Biochemistry 10, 3516-3526. Colman, R.F (1969). The role of sulfhydryl groups in the catalytic function of isocitrate dehydrogenase. I. Reaction with 5,5/-dithiobis(2-nitrobenzoic acid). Biochemistry 8,888-898. Covelli, I. and Wolff, J. (1966). Iodohistidine formation in ribonuclease A. J. Biol. Chem. 241, 4444-4451. Creighton, T.E. (1993). Proteins: Structure and Molecular Properties, Second ed., pp 4-6. Freeman and Co., New York. Cuatrecasas, P., Fuchs, S., and Anfinsen, C.B. (1969). Cross-linking of aminotyrosyl residues in the active site of staphylococcal nuclease. J. Biol. Chem. 244, 406-412. Cunningham, L.W. and Nuenke, B.J. (1961). Physical and chemical studies of a limited reaction of iodine with proteins. J. Biol. Chem. 234, 1447-1451. Cunningham, L.W., Crews, B.C. and Gettins, P. (1990). Inhibition and partial reversal of the methylamine-induced conversion of "slow" to "fast" electrophoretic forms of human oc2-macroglobulin by modification of the thiols. Biochemistry 29, 1638-1643.
52
GARY E. MEANS, HAO ZHANG, and MIN LE
David, G.S. (1972). Solid state lactoperoxidase: A highly stable enzyme for simple, gentle iodination of proteins. Biochem. Biophys. Res. Commun. 48, 464-471. Davies, G.E. and Stark, G.R. (1970). Use of dimethyl suberimidate, a cross-linking reagent, in studying the subunit structure of oligomeric proteins. Proc. Natl. Acad. Sci. USA 66, 651-656. Di Donato, A., Ciardiello, M.A., de Nigris, M., Piccoli, R., Mazzarella, L., and D'Alessio, G. (1993). Selective deamidation of ribonuclease A. Isolation and characterization of the resulting isoaspartyl and aspartyl derivatives. J. Biol. Chem. 268, 4745-4751. Dixon, H.B.F. and Fields, R. (1972). Specific modification of NH2-terminal residues by transamination. Methods Enzymol. 25, 409-419. Dixon, M. and Webb, E.C. (1964). Enzymes, Second ed., pg 144. Academic Press, New York. Dormann, P., Borchers, T., Korf, U., Hojrup, P., Roepstorff, P., and Spener, F. (1993). Amino acid exchange and covalent modification by cysteine and glutathione explain isoforms of fatty acidbinding protein occurring in bovine liver. J. Biol. Chem. 268, 16286-16292. Dudkin, S.M., Karabachyan, L.V., Borisova, S.N., Shlyapnikov, S.V., Karperisky, M.Y., and Geidarov, T.G. (1975). Spectral properties of phosphopyridoxyl-Lys-7(-41)-ribonuclease A. Biochim. Biophys. Acta 386, 275-282. Edelhoch, H. (1962). The properties of thyroglobulin VIII. The reaction of thyroglobulin. J. Biol. Chem. 237, 2778-2787. Ellman, G.L. (1959). Tissue sulfhydryl groups. Arch. Biochem. Biophys. 72, 70-77. Epperly, B.R. and Dekker, E.E. (1989). Inactivation off. coli L-threonine dehydrogenase by 2,3-butanedione. Evidence for a catalytically essential arginine residue. J. Biol. Chem. 264, 1829618301. Eyzaguirre, J. (Ed.)(1987). Chemical Modification of Enzymes. Ellis Horwood Limited, Chichester, England. Fasman, G.D. (1974). The Handbook of Biochemistry and Molecular Biology: Physical and Chemical Data, 2nded., vol. 1, pp 151-265. CRC Press, Cleveland, OH. Fields, R. (1971). The measurement of amino groups in proteins and peptides. Biochem. J. 124,581 -590. Fields, R. (1972). The rapid determination of amino groups with TNBS. Methods Enzymol. 25, 464-468. Fraker, P.J. and Speck, J.C., Jr. (1978). Protein and cell membrane iodinations with a sparingly soluble chloroamide, l,3,4,6-tetrachloro-3a,6a-diphenylglycoluril. Biochem. Biophys. Res. Commun. 80, 849-857. Frost, S.C. and Lane, M.D., (1985). Evidence for the involvement of vicinal sulfhydryl groups in insulin-activated hexose transport by 3T3-L1 adipocytes. J. Biol. Chem. 260, 2646-2652. Garza-Ramos, G., Gomez-Puyou, M.T., Gomez-Puyou, A., Yuksel, U., and Gracy, R.W. (1994). Deamidation of triosephosphate isomerase in reverse micelles: Effects of water on catalysis and molecular wear and tear. Biochemistry 33, 6960-6965. Geiger, T. and Clarke, S. (1987). Deamidation, isomerization, and racemization at asparaginyl and aspartyl residues in peptides. Succinimide-linked reactions that contribute to protein degradation. J. Biol. Chem. 262, 785-794. Geoghegan, K.F. and Stroh, J.G. (1992). Site-directed conjugation of nonpeptide groups to peptides and proteins via periodate oxidation of a 2-amino alcohol. Application to modification at N-terminal serine. Bioconj. Chem. 3, 138-146. Glazer, A.N. and Delange, R.J. (1975). Chemical Modification of Proteins. (Work, T.S. and Work, E., Eds.), North-Holland Pub. Co., Amsterdam, The Netherlands. Goldenberg, D.P., Bekeart, L.S., Laheru, D.A., and Zhou, J.D. (1993). Probing the determinants of disulfide stability in native pancreatic trypsin inhibitor. Biochemistry 32, 2835-2844. Goldfarb, A.R. (1970). Reactivity of amino groups in proteins. Biochem. Biophys. Acta 200, 1-8. Goldfarb, A.R. (1974). Reactions of the amino groups in ribonuclease A. 1. Kinetic studies. Bioorg. Chem. 3, 249-259.
The Chemistry of Protein Functional Groups
53
Grassetti, D.R. and Murray, J.F. (1967). Determination of sulfhydryl groups with 2,2'- or 4,4'-dithiodipyridine. Arch. Biochem. Biophys. 119, 41-49. Gravina, S.A. and Mieyal, J.J. (1993). Thioltransferase is a specific glutathionyl mixed disulfide oxidoreductase. Biochemistry 32, 3368-3376. Greenwood, F.C., Hunter, W.M., and Glover, J.S. (1963). The preparation of I-labelled human growth hormone of high specific activity. Biochem. J. 89, 114-123. Grimshaw, C.E., Whistler, R.L., and Cleland, W.W. (1979). Ring opening and closing rates for thiosugars. J. Am. Chem. Soc. 101, 1521-1532. Hartman, F.C., Milanez, S., and Lee, E.H. (1985). Ionization constants of two active-site lysyl epsilon-amino groups of ribulosebisphosphate carboxylase/oxygenase. J. Biol. Chem. 260, 13968-13975. Haugland, R.R (1989). Handbook of Fluorescent Probes and Research Chemicals. Molecular Probes Inc., Eugene, Oregon. Hawes, J.W., Crabb, D.W., Chan, R.M., Rougraff, P.M., and Harris, R.A. (1995). Chemical modification and site-directed mutagenesis studies of rat 3-hydroxyisobutyrate dehydrogenase. Biochemistry 34,4231-4237. Hirs, C.H.W. (Ed.)(1967). Methods Enzymology Vol. 11. Academic Press, New York. Hirs, C.H.W. and Timasheff, S.N. (Eds.)(1972). Methods Enzymology. Vol. 25. Academic Press, New York. Hirs, C.H.W. and Timasheff, S.N. (Eds.)(1977). Methods Enzymology. Vol. 47, Academic Press, New York. Hirs, C.H.W. and Timasheff, S.N. (Eds.)(1983). Methods Enzymol. Vol. 91. Academic Press, New York. Hoare, D.G. and Koshland, D.E. (1967). A method for the quantitative modification and estimation of carboxylic acid groups in proteins. J. Biol. Chem. 242, 2447-2453. Horinishi, H., Nakaya, K., Tani, A., and Shibata, K. (1968). States of amino acid residues in proteins. XV. Ethyl-morpholinyl-propylcarbodiimide for modification of carboxyl groups in proteins. J. Biochem. (Tokyo) 63, 41-50. Houghton, R.A. and Li, C.H. (1983). Reduction of sulfoxides in peptides and proteins. Methods Enzymol. 91,549-559. Huber R.E., Edwards, L.A., and Carne, T.J. (1989). Studies on the mechanism of the iodination of tyrosine by lactoperoxidase. J. Biol. Chem. 264, 1381-1386. Hunter, M.J. and Ludwig, M.L. (1962). The reaction of imidoesters with proteins and related small molecules. J. Am. Chem. Soc. 84, 3491-3504. Huynh, Q.K. and Snell, E.E. (1986). Pyruvoyl-dependent histidine decarboxylase from Lactobacillus 30a. Covalent modifications of aspartic acid 191, lysine 155, and the pyruvoyl group. J. Biol. Chem. 261,4389-4394. Illy, C, Thielens, N.M., Gagnon, J., and Arlaud, G.J. (1991). Effect of lactoperoxidase-catalyzed iodination on the Ca +-dependent interactions of human Cls. Location of the iodination sites. Biochemistry 30, 7135-7141. Imoto, T. and Yamada, H. (1989). Chemical modification. In: Protein Function; A Practical Approach. (Creighton, T.E., Ed.), pp 247-277. IRL Press, Oxford, England. Inman, J.K., Perham, R.N., DuBois, G.C., and Appella, E. (1983). Amidination. Methods Enzymol. 91, 559-569. Jentoft, J.E., Jentoft, N., Gerkin, T.A., and Dearborn, D.G. (1979). ,3 C NMR studies of ribonuclease A methylated with [13C]Formaldehyde. J. Biol. Chem. 254, 4366-4370. Jentoft, N. and Dearborn, D.G. (1979). Labeling of proteins by reductive methylation using sodium cyanoborohydride. J. Biol. Chem. 254, 4359-4365. Jentoft, N. and Dearborn, D.G. (1983). Protein labeling by reductive alkylation. Methods Enzymol. 91, 570-579.
54
GARY E. MEANS, HAO ZHANG, and MIN LE
Johnson, B.A., Shirokawa, J.M., Hancock, W.S., Spellman, M.W., Basa, L.J., and Aswad, D.W. (1989). Formation of isoaspartate at two distinct sites during in-vitro aging of human growth hormone. J. Biol. Chem. 264, 14262-14271. Jue, R., Lambert, J.M., Pierce, L.R., and Traut, R.R. (1978). Addition of sulfhydryl groups to E. coli ribosomes by protein modification with 2-iminothiolane (methyl 4-mercaptobutyrimidate). Biochemistry 17, 5399-5406. Kalgutkar, A.S. and Marnett, L.J. (1994). Rapid inactivation of prostaglandin endoperoxide synthases by N-(carboxyalkyl)maleimides. Biochemistry 33, 8625-8628. Kastenschmidt, L.L., Kastenschmidt, J., and Helmreich, E. (1968). The effect of temperature on the allosteric transitions of rabbit skeletal muscle phosphorylase b. Biochemistry 7, 3590-3608. Kay, E., Strickland, E.H., and Billups, C. (1974). Near ultraviolet circular dichroism and absorption spectra of chicken ovomucoid and acetylated derivatives at 297 and 77 degrees K. J. Biol. Chem. 249, 797-802. Kenyon, G.L. and Bruice, T.W. (1977). Novel sulfhydryl reagents. Methods Enzymol. 47, 407-430. Kindman L.A. and Jencks, W.R (1981). Modification and inactivation of CoA transferase by 2-nitro-5(thiocyanato)benzoate. Biochemistry 20, 5183-5187. Kirley, T.L. (1990). Inactivation of Na+/K+-ATPase by (3-mercaptoethanol. Differential sensitivity to reduction of the three (3 subunit disulfide bonds. J. Biol. Chem. 265, 4227-4232. Konishi, K. and Fujioka, M. (1987). Chemical modification of a functional arginine residue of rat liver glycine methyltransferase. Biochemistry 26, 8496-8502. Koshland, M.E., Englberger, F.M., Erwin, M.J., and Gaddone, S.M (1963). Modification of amino acid residues in anti-/?-azobenzenearsonic acid antibody during extensive iodination. J. Biol. Chem. 238, 1343-1348. Kurosky, A. and Hofmann, T. (1972). Kinetics of the reaction of nitrous acid with model compounds and proteins, and the conformational state of N-terminal groups in the chymotrypsin family. Canadian J. Chem. 50, 1282-1296. Kyte, J. (1995). Structure in Protein Chemistry., pg 59. Garland Pub. Inc., New York. Lawrence, D.A. and Loskutoff, D.J. (1986). Inactivation of plasminogen activator inhibitor by oxidants. Biochemistry 25, 6351-6355. Le, M. and Means, G.E. (1995). A procedure for the determination of monothiols in the presence of dithiothreitol-an improved assay for the reduction of disulfides. Analyt. Biochem. 229, 264-271. Lei, Y, Ploux, O., and Liu, H. (1995). Mechanistic studies on CDP-6-deoxy-L-threo-D-glycero-4-hexulose-3-dehydrase identification of His-220 as the active-site base by chemical modification and site-directed mutagenesis. Biochemistry 34, 4643-4654. Levison, B.S., Wiemels, J., Szasz, J., and Sternlicht, H. (1989). Ethoxyformylation of tubulin with H-diethyl pyrocarbonate: A reexamination of the mechanism of assembly inhibition. Biochemistry 28, 8877-8884. Levitt, M. (1976). A simplified representation of protein conformations for rapid simulation of protein folding. J. Mol. Biol. 104, 59-107. Lewis, C.T., Seyer, J.M., Cassell, R.G., and Carlson, G.M. (1993). Identification of vicinal thiols of phosphoenolpyruvate carboxykinase (GTP). J. Biol. Chem. 268, 1628-1636. Li, J. and Pickart, CM. (1995). Binding of phenylarsenoxide to Arg-tRNA protein transferase is independent of vicinal thiols. Biochemistry 34, 139-147. Lin, C, Mihal, K.A., and Krueger, R.J. (1990). Introduction of sulfhydryl groups into proteins at carboxyl sites. Biochim. Biophys. Acta 1038, 382-389. Lin, T.-Y. and Koshland, D.E. (1969). Carboxyl group modification and the activity of lysozyme. J. Biol. Chem. 244, 505-508. Lin, T.-Y. and Kim, P.S. (1989). Urea dependence of thiol-disulfide equilibria in thioredoxin: Confirmation of the linkage relationship and a sensitive assay for structure. Biochemistry 28,5282-5287. Lopez, S., Miyashita, Y, and Simons, S. (1990). Structurally based, selective interaction of arsenite with steroid receptors. J. Biol. Chem. 265, 16039-16042.
The Chemistry of Protein Functional Groups
55
Ludwig, M.L. and Hunter, M.J. (1967). Amidination. Methods Enzymol. 11, 595-604. Lundblad, R.L. (1991). Chemical Reagents for Protein Modification, Second ed. CRC Press, Boca Raton. Lundblad, R.L. (1995). Techniques in Protein Modification. CRC Press, Boca Raton. Lundblad, R.L. and Noyes, CM. (1984). Chemical Reagents for Protein Modification, Vols. 1 and 2. CRC Press, Boca Raton. Lura, R. and Schirch, V. (1988). Role of peptide conformation in the rate and mechanism of deamidation of asparaginyl residues. Biochemistry 27, 7671-7677. Makoff, A.J. and Malcolm, A.D. (1981). Properties of methyl acetimidate and its use as a protein-modifying reagent. Biochem. J. 193, 245-249. Malan, P.G. and Edelhoch, H. (1970). Nitration of human serum albumin and bovine and human goiter thyroglobulins with tetranitromethane. Biochemistry 9, 3205-3214. Markwell, M.A.K. (1982). A new solid-state reagent to iodinate proteins. I. Conditions for the efficient labeling of antiserum. Analyt. Biochem. 125, 427-432. May, J.M. (1989). Selective labeling of the erythrocyte hexose carrier with a maleimide derivative of glucosamine: Relationship of an exofacial sulfhydryl to carrier conformation and structure. Biochemistry 28, 1718-1725. Means, G.E. (1977). Reductive alkylation of amino groups. Methods Enzymol. 47, 469-478. Means, G.E. and Feeney, R.E. (1968). Reductive alkylation of amino groups in proteins. Biochemistry 7,2192-2201. Means, G.E. and Feeney, R.E, (1971a). Chemical Modification of Proteins. Holden-Day Inc., San Francisco, CA. Means, G.E. and Feeney, R.E. (1971b). Affinity labeling of pancreatic ribonuclease. J. Biol. Chem. 246, 5532-5534. Means, G.E. and Feeney, R.E. (1993). In: Biochemistry LabFax. (Chambers, J.A.A. and Rickwood, D., Eds.), pp 215-245. Bios Scientific Publishers, Oxford. Means, G.E. and Feeney, R.E. (1995). Reductive alkylation of proteins. Analyt. Biochem. 224, 1-16. Meinwald, Y.C., Stimson, E.R., and Scheraga, H.A. (1986). Deamidation of the asparaginyl-glycyl sequence. Int. J. Peptide Protein Res. 28, 70-84. Melchior, W.B. and Fahrney, D. (1970). Ethoxyformylation of proteins. Reaction of ethoxyformic anhydride with a-chymotrypsin, pepsin, and pancreatic ribonuclease at pH 4. Biochemistry 9, 251-258. Miles, A.M. and Smith, R.L. (1993). Functional methionines in the collagen/gelatin binding domain of plasma fibronectin: Effects of chemical modification by chloramine T. Biochemistry 32, 81688178. Miles, E.W. (1977). Modification of histidyl residues in proteins by diethylpyrocarbonate. Methods Enzymol. 47, 431-442. Miller, S., Janin, J., Lesk, A.M., and Chothia, C. (1987). Interior and surface of monomeric proteins. J. Mol. Biol. 196,641-656. Mitchinson, C. and Wells, J.A. (1989). Protein engineering of disulfide bonds in subtilisin BPN'. Biochemistry 28, 4807-4815. Moore, J.E. and Ward, W.H. (1956). Cross-linking of bovine plasma albumin and wool keratin. J. Am. Chem. Soc. 78, 2414-2418. Morand, L.Z., Frame, M.K., Colvert, K.K., Johnson, D.A., Krogmann, D.W., and Davis, D.J. (1989). Plastocyanin cytochrome f interaction. Biochemistry 28, 8039-8047. Morihara, K. and Nagami, K. (1969). Tryptophan residue in the active site of papain. J. Biochem. 65, 321-323. Morrison, M. and Bayse, G.S. (1970). Catalysis of iodination by lactoperoxidase. Biochemistry 9, 2995-3000.
56
GARY E. MEANS, HAO ZHANG, and MIN LE
Mueller, M.J., Samuelsson, B., and Haeggstrom, J.Z. (1995). Chemical modification of leukotriene A4 hydrolase. Indications for essential tyrosyl and arginyl residues at the active site. Biochemistry 34, 3536-3543. Myers, B. and Glazer, A.N. (1971). Spectroscopic studies of the exposure of tyrosine residues in proteins with special reference to the subtilisins. J. Biol. Chem. 246, 412-419. Nanduri, V.B. and Modak, M.J. (1990). Lysine-329 of murine leukemia virus reverse transcriptase: Possible involvement in the template-primer binding function. Biochemistry 29, 5258-5264. Ohnishi, M., Kawagishi, T., Abe, T., and Hiromi, K. (1980). Stopped-flow studies on the chemical modification with N-bromosuccinimide of model compounds of tryptophan residues. J. Biochem. (Tokyo) 87, 273-279. Okuyama, T. and Satake, K. (1960). On the preparation and properties of 2,4,6-trinitrophenyl-amino acids and peptides. J. Biochem. (Tokyo) 47, 454-466. Palmer, M„ Jursch, RM Weller, U., Velva, A., Hilgert, K., Kehoe, M., and Bhakdi, S. (1993). Staphylococcus aureus a-toxin. Production of functionally intact, site-specifically modifiable protein by introduction of cysteine at positions 69, 130, and 186. J. Biol. Chem. 268, 11959-11962. Pan, R.J. and Cherry, R.J. (1995). Evidence that eosin-5-maleimide binds close to the anion transport site of human erythrocyte band 3: A fluorescence quenching study. Biochemistry 34,4880-48 88. Pasta, P., Mazzola, G. and Carrea, G. (1987). Chemical modification of 3a,20(3-hydroxysteroid dehydrogenase with diethyl pyrocarbonate. Evidence for an essential, highly reactive, lysyl residue. Biochemistry 26, 1247-1251. Pathak, R., Hendrickson, T.L., and Imperiali, B. (1995). Sulfhydryl modification of the yeast Wbplp inhibits oligosaccharyl transferase activity. Biochemistry 34, 4179-4185. Pierce Chemical Co. (1994a). Catalog and Handbook, pp. O-90-O-110 and T-155-T-200. Pierce Chemical Co. (1994b). Catalog and Handbook, pp. T-139-T-140. Plapp, B.V. (1970). Enhancement of the activity of horse liver alcohol dehydrogenase by modification of amino groups at the active sites. J. Biol. Chem. 245, 1727-1735. Ploux, O., Lei, Y, Vatanen, K., and Liu, H. (1995). Mechanistic studies on CDP-6-deoxy-A ' -glucoseen reductase: The role of cysteine residues in catalysis as probed by chemical modification and site-directed mutagenesis. Biochemistry 34, 4159-4168. Prieels, J.-P., Dolmans, M., Leonis, J., and Brew, K. (1975). Nitration of tyrosyl residues in human a-lactalbumin. Effect on lactose synthase specifier activity. Eur. J. Biochem. 60, 533-539. Pullan, L.M. and Noltmann, E.A. (1985). Specific arginine modification at the phosphatase site of muscle carbonic anhydrase. Biochemistry 24, 635-640. Rice, R.H. and Means, G.E. (1971). Radioactive labeling of proteins in vitro. J. Biol. Chem. 246, 831-832. Riddles, P.W., Blakeley, R.L., and Zerner, B. (1979). Ellman's reagent: 5,5'-dithiobis(2-nitrobenzoic acid)-A reexamination. Analyt. Biochem. 94, 75-81. Riley, M. and Perham, R.N. (1973). The reaction of protein amino groups with methyl 5-iodopyridine2-carboximidate. A possible general method of preparing isomorphous heavy-atom derivatives of proteins. Biochem. J. 131, 625-635. Riordan, J.F. (1973). Functional arginyl residues in carboxypeptidase A. Modification with butanedione. Biochemistry 12, 3915-3923. Riordan, J.F., Wacker, W.E.C., and Vallee, B.L. (1965). N-Acetylimidazole: A reagent for determination of "free" tyrosyl residues of proteins. Biochemistry 4, 1758-1765. Riordan, J.F. and Vallee, B.L. (1972a). O-Acetyltyrosine. Methods Enzymol. 25, 500-506. Riordan, J.F. and Vallee, B.L. (1972b). Nitration with tetranitromethone. Methods Enzymol. 25, 515-521. Rippa, M.L., Spanio, L., and Pontremoli, S. (1967). A specific interaction of pyridoxal 5'-phosphate and 6-phosphogluconic dehydrogenase. Arch. Biochem. Biophys. 118,48-57.
The Chemistry of Protein Functional Groups
57
Santi, D.V., Ouyong, T.M., Tan, A.K., Gregory, D.L., Scanlan, T., and Carreras, C.W. (1993). Interaction of thymidylate synthase with pyridoxal 5'-phosphate as studied by UV/visible difference spectroscopy and molecular modeling. Biochemistry 32, 11819-11824. Siedler, F., Rudolph-Bohner, S., Doi, M., Musiol, H.J., and Moroder, L. (1993). Redox potentials of active-site bis(cysteinyl)fragments of thiol-protein oxidoreductases. Biochemistry 32,7488-7495. Sharma, S., Hammen, P.K., Anderson, J.W., Leung, A., George, F., Hengstenberg, W., Klevit, R.E., and Waygood, E.B. (1993). Deamidation of Hpr, a phosphocarrier protein of the phosphoenolpyruvate: Sugar phosphotransferase system, involves asparagine 38 (HPr-1) and asparagine 12 (HPr-2) in isoaspartyl acid formation. J. Biol. Chem. 268, 17695-17704. Shechter, Y, Burstein, Y, and Patchornik, A. (1975). Selective oxidation of methionine residues in proteins. Biochemistry 14, 4497-4503. Shechter, Y, Patchornik, A., and Burstein, Y (1976). Selective chemical cleavage of tryptophanyl peptide bonds by oxidative chlorination with N-chlorosuccinimide. Biochemistry 15, 5071-5075. Shetty, J.K. and Kinsella, J.E. (1980). Ready separation of proteins from nucleoprotein complexes by reversible modification of lysine residues. Biochem. J. 191, 269-272. Shields, G.S., Hill, R.L., and Smith, E.L. (1959). Preparation and properties of guanidinated murcuripapain. J. Biol. Chem. 234, 1747-1753. Simons, S.S. and Pratt, W.B. (1995). Glucocorticoid receptor thiols and steroid-binding activity. Methods Enzymol. 251, 406-422. Singh, R. and Whitesides, G.M. (1994). Reagents for rapid reduction of native disulfide bonds in proteins. Bioorg. Chem. 22, 109-115. Smith, D.J. and Kenyon, G.L. (1974). Nonessentiality of the active sulfhydryl groups of rabbit muscle creatine kinase. J. Biol. Chem. 249, 3317-3318. Sokolovsky, M., Riordan, J.F., and Vallee, B.L. (1966). Tetranitromethane. A reagent for the nitration of tyrosyl residues in proteins. Biochemistry 5, 3582-3589. Sokolovsky, M., Harell, D., and Riordan, J.F. (1969). Reaction of tetranitromethane with sulfhydryl groups in proteins. Biochemistry 8, 4740-4745. Sokolovsky, M., Fuchs, M., and Riordan, J.F. (1970). Reaction of tetranitromethane with tryptophan and related compounds. FEBS Letters 7, 167-170. Spande, T.F, Green, N.M., and Witkop, B. (1966). The reactivity toward N-bromosuccinimide of tryptophan in enzymes, zymogens, and inhibited enzymes. Biochemistry 5, 1926-1933. Spande, T.F. and Witkop, B. (1967). Determination of the tryptophan content of proteins with N-bromosuccinimide. Methods Enzymol. 11, 498-506. Stancato, L.F., Hutchison, K.A., Chakraborti, P.K., Simons, S.S., and Pratt, W.B. (1993). Differential effects of the reversible thiol-reactive agents arsenite and methyl methanethiosulfonate on steroid binding by the glucocorticoid receptor. Biochemistry 32, 3729-3736. Stauffer, C.E. and Etson, D. (1969). The effect on subtilisin activity of oxidizing a methionine residue. J. Biol. Chem. 244, 5333-5338. Stauffer, D.A. and Karlin, A. (1994). Electrostatic potential of the acetylcholine binding sites in the nicotinic receptor probed by reactions of binding-site cysteines with charged methanethiosulfonates. Biochemistry 33, 6840-6849. Stephenson, R.C. and Clark, S. (1989). Succinimide formation from aspartyl and asparaginyl peptides as a model for the spontaneous degradation of proteins. J. Biol. Chem. 264, 6164-6170. Stevenson, K.J., Hale, G.,and Perham, R.N. (1978). Inhibition of pyruvate dehydrogenase multienzyme complex from E. coli with mono- and bifunctional arsenoxides. Biochemistry 17, 2189-2192. Stole, E. and Meister, A. (1991). Interaction of y-glutamyl transpeptidase with glutathione involves specific arginine and lysine residues of the heavy subunit. J. Biol. Chem. 266, 17850-17857. Stuchbury, T., Shipton, M., Norris, R., Malthouse, J.P.G., and Brocklehurst, K. (1975). A reporter group delivery system with both absolute and selective specificity for thiol groups and an improved fluorescent probe containing the 7-nitrobenzo-2-oxa-l,3-diazole moiety. Biochem. J. 151, 417432.
58
GARY E. MEANS, HAO ZHANG, and MIN LE
Tack, B.F., Dean, J., Eilat, D., Lorenz, P.E., and Schecter, A.N. (1980). Tritium labeling of proteins to high specific radioactivity by reduction methylation. J. Biol. Chem. 255, 8842-8847. Takahashi, K. (1968). The reaction of phenylglyoxal with arginine residues in proteins. J. Biol. Chem. 243,6171-6179. Toniyama, Y, Seko, C , and Kikuchi, M. (1990). Secretion in yeast of mutant human lysozymes with and without glutathione bound to cysteine 95. J. Biol. Chem. 265, 16767-16771. Tuong, A., Maftouh, M , Ponthus, C , Whitechurch, O., Roitsch, C , and Picard, C. (1992). Characterization of the deamidated forms of recombinant hirudin. Biochemistry 31, 8291-8299. Turina, P., Aggeler, R., Lee, R.S.F., Senior, A.E., and Capaldi, R.A. (1993). The cysteine introduced into the subunit of the E. coli Fl-ATPase by the mutation R376C is near the a(3 subunit interface and close to a noncatalytic nucleotide binding site. J. Biol. Chem. 268, 6978-6984. Tyagi, S.C. (1991). Reversible inhibition of neutrophil elastase by thiol-modified a-1 protease inhibitor. J. Biol. Chem. 266, 5279-5285. Tyler-Cross, R. and Schirch, V. (1991). Effects of amino acid sequence, buffers, and ionic strength on the rate and mechanism of deamidation of asparagine residues in small peptides. J. Biol. Chem. 266, 22549-22556. Wagner, O., Irion, E., Arens, A., and Bauer, K. (1969). Partially deaminated L-asparaginase. Biochem. Biophys. Res. Commun. 37, 383-392. Wang, A.-Y, Grogan, D.W., and Cronan, J.E. (1992). Cyclopropane fatty acid synthase of E. coli: Deduced amino acid sequence, purification, and studies of the enzyme active site. Biochemistry 31, 11020-11028. Wang, T.-T. and Young, N.M. (1978). Modification of aspartic acid residues to induce trypsin cleavage. Analyt. Biochem. 91,696-699. Wassarman, P.M. and Major, J.P. (1969). The reactivity of the sulfhydryl groups of lobster muscle glyceraldehyde 3-phosphate dehydrogenase. Biochemistry 8, 1076-1082. Webb, J.L. (1966). Enzyme and Metabolic Inhibitors, Vol. 3. pp. 608-658, Academic Press, New York. Wells, T.N.C., Scully, P., and Magnenat, E. (1994). Arginine 304 is an active site residue in phosphomannose isomerase from Candida albicans. Biochemistry 33, 5777-5782. Weltman, J.K., Szaro, R.P., Frackelton, A.R., Dowben, R.M., Bunting, J.R., and Cathou, B.E. (1973). N-(3-pyrene)maleimide: A long lifetime fluorescent sulfhydryl reagent. J. Biol. Chem. 248, 3173-3177. Wetzel, R., Perry, L.J., Basse, W.A., and Becktel, W.J. (1988). Disulfide bonds and thermal stability in T4 lysozyme. Proc. Natl. Acad. Sci. USA 85, 401-405. Wetzel, R., Halualani, R., Stults, J.T., and Quan, C. (1990). A general method for highly selective cross-linking of unprotected polypeptides via pH-controlled modification of N-terminal a-amino groups. Bioconj. Chem. 1, 114-122. Wilbur, D.S. (1992). Radiohalogenation of proteins: An overview of radionuclides, labeling methods, and reagents for conjugate labeling. Bioconj. Chem. 3, 433-470. Wilson, J.M., Wu, D., Motiu-DeGrood, R., and Hupe, D.J. (1980). A spectrophotometric method for studying the rates of reaction of disulfides with protein thiol groups applied to bovine serum albumin. J. Amer. Chem. Soc. 102, 359-363. Wong, S.S. (1991). Chemistry of Protein Conjugation and Cross-Linking. CRC Press, Boca Raton, FL. Wower, J., Maly, P., Zobawa, M., and Brimacombe, R. (1983). Identification of tyrosine residues that are susceptible to lactoperoxidase-catalyzed iodination on the surface of E. coli 30S ribosomal subunit. Biochemistry 22, 2339-2346. Wright, H.T. (1991). Sequence and structure determinants of the nonenzymatic deamidation of asparagine and glutamine residues in proteins. Protein Eng. 4, 283-294. Yamada, H., Imoto, T., Fujita, K., Okazaki, K., and Motomura, M. (1981). Selective modification of aspartic acid-101 in lysozyme by carbodiimide reaction. Biochemistry 20, 4836-4842.
The Chemistry of Protein Functional Groups
59
Yamada, H., Kuroki, R., Hirata, M., and Imoto, T. (1983). Intramolecular cross-linkage of lysozyme. Imidazole catalysis of the formation of the cross-link between lysine-13 e-amino and leucine-129 a-carboxyl by carbodiimide reaction. Biochemistry 22, 4551-4556. Yamasaki, R.B., Vega, A., and Feeney, R.E. (1980). Modification of available arginine residues in proteins by/?-hydroxyphenylglyoxal. Analyt. Biochem. 109, 32-40. Yamasaki, R.B., Shimer, D.A., and Feeney, R.E. (1981). Colorimetric determination of arginine residues in proteins by/?-nitrophenylglyoxal. Analyt. Biochem. I l l , 220-226. Yamashita, H., Nakatsuka, T., and Hirose, M. (1995). Structural and functional characteristics of partially disulfide-reduced intermediates of ovotransferrin N lobe. Cystine localization by indirect end-labeling approach and implications for the reduction pathway. J. Biol. Chem. 270, 2980629812. Yu, B., Pereira, M.E., and Wilk, S. (1991). Chemical modification of the bovine pituitary multicatalvtic proteinase complex by N-acetylimidazole. Reversible activation of casein hydrolysis. J. tJiol. Chem. 266, 17396-17400. Zahler, W.L. and Cleland, W.W. (1968). A specific and sensitive assay for disulfides. J. Biol. Chem. 243, 716-719. Zapun, A., Bardwell, J.C.A., and Creighton, T.E. (1993). The reactive and destabilizing disulfide bond of DsbA, a protein required for protein disulfide bond formation in vivo. Biochemistry 32, 5083-5092. Zhang, M. and Vogel, H.J. (1993). Determination of the side chain pKa values of the lysine residues in calmodulin. J. Biol. Chem. 268, 22420-22428.
Chapter 3
Electrostatic Effects in Proteins: EXPERIMENTAL AND COMPUTATIONAL APPROACHES Norma M. Allewell, Himanshu Oberoi, Meena Hariharan, and Vince J. LiCata
Introduction Basic Principles Ionization and pKa Values Ionizable Groups in Proteins Titration Curves and Isoelectric Points Dipoles in Proteins Dielectric Constant Factors Influencing the pKa Values of Ionizable Groups in Proteins Potential Surfaces Experimental Approaches Ion Exchange Chromatography Electrophoresis and Isoelectric Focusing Dependence of Global Properties on pH Site-Directed Mutagenesis
Protein: A Comprehensive Treatise Volume 2, pages 61-97 Copyright © 1999 by JAI Press Inc. All rights of reproduction in any form reserved. ISBN: 1-55938-672-X 61
62 64 64 64 66 67 67 68 69 69 69 70 73 73
62
ALLEWELL, OBEROI, HARIHARAN, and LICATA
Individual Site Titrations Linkage Theory Examples of Experimental Studies Catalytic Mechanisms Long-Range Interactions Site-Directed Mutagenesis Single-Site Titrations Determining Surface Charge Interactions with Lipids Theoretical Approaches Macroscopic versus Microscopic Models Solutions of the Poisson-Boltzmann Equation Calculating Free Energies and pKa Values Calculating Titration Curves Examples of Theoretical Studies Protein Stability Estimating pKa Values Free Energies of Ligand Binding Kinetics of Ligand Binding Enzyme Mechanisms Redox Potentials Interactions with Lipids Hormone-Receptor Interactions Future Prospects
74 74 74 74 75 76 77 77 78 78 78 79 82 83 84 84 85 86 88 89 90 90 91 91
INTRODUCTION Seven of the 20 naturally occurring amino acids in proteins are ionizable, as are the amino and carboxyl termini of most proteins. This subset of amino acids is responsible for many of the functions carried out by proteins and enzymes. Among the most important are interactions with substrates, other proteins, and solvent, through hydrogen bonds, electrostatic bonds (or "salt bridges"), and van der Waals interactions, and protein and ion binding. The importance of electrostatic effects in protein structure and function first became evident from observations of the variation in solubility of proteins with pH and ionic strength, and the sensitivity of enzymes to pH. Electrostatic effects were quickly exploited in protein purification, through the use of precipitation at the isoelectric point, precipitation by salts such as (NH 4 ) 2 S0 4 , and ion exchange chromatography, electrophoresis, and isoelectric focusing. The sensitivity of enzyme activity to pH quickly led to an appreciation of the importance of ionizable groups at the active site in binding and catalysis. As the variety of proteins studied expanded to include receptors, ion channels, nucleic acid binding proteins, and proteins of the immune system, so too did the recognition that electrostatics are involved in the function of the vast majority of proteins. Studies of the Bohr effect in hemoglobin first demonstrated the importance of electrostatics in protein-pro-
Electrostatic Effects in Proteins
63
tein interactions and the regulation of function. Ionizable groups are also crucial to protein folding and stability as illustrated by the commonly observed effects of pH and salt on stability. To understand the physical basis of these effects, the individual ionizable groups involved in these processes must be identified and characterized. Early studies relied upon fitting experimental observations of protein behavior in response to pH changes to models involving one or two (and occasionally three) ionizable groups. This very naturally led to the conclusion that only a few of the many ionizable groups in proteins are essential to function. Several famous examples of the application of this approach to small, well-studied enzymes have stood the test of time remarkably well. However, as high-field nuclear magnetic resonance spectroscopy developed and made it possible to examine the titration behavior of individual residues, it became clear that this was not likely to be the case for most proteins, particularly large proteins like hemoglobin, despite the elegance of models involving only a few ionizable groups. Many of the questions that one would like to ask about electrostatic effects in proteins are not readily accessible by experiment but can be addressed by computer models based upon classical electrostatic theory. Computer modeling has received increasing attention as the number of three-dimensional structures determined by X-ray crystallography and, more recently, by multidimensional nuclear magnetic resonance spectroscopy has grown, and as computers have become smaller and faster. The earliest model of electrostatic effects in proteins, however, preceded the determination of the first three-dimensional structure by several decades (Linderstrom-Lang, 1924). Since that time, there have been several stages in the development of models: the use of atomic coodinates, the incorporation of solvent accessibility, the development of grid methods to treat irregular surfaces, and the inclusion of the self-energy of partially or completely buried charged groups. When state-of-the-art computational packages became available, they were applied to a wide variety of systems. While the earliest of these packages were able to calculate only potential surfaces, recent developments make possible the calculation of both thermodynamic and kinetic parameters, for example, pKa values and rate constants. Site-directed mutagenesis enables individual residues to be investigated experimentally and hence is complementary to computational modeling. This complementarity is important since both approaches have limitations; for example, the effects of modifying an individual residue may be propagated far beyond that residue, and modeling is generally based upon a single, static structure determined under one specific set of conditions. Investigating experimentally the effects on function of modifying charged groups and correlating these results with the predictions of theoretical models has become a very active area in recent years. This chapter discusses all these topics. It begins with elementary concepts and gradually increases in difficulty. The second section reviews some basic chemical and structural principles. The methods that have been used to examine electrostatic effects experimentally are discussed in the third section. These approaches are
64
ALLEWELL, OBEROI, HARIHARAN, and LICATA
illustrated in the fourth section by discussions of several recent experimental studies. The fifth section deals with the development and current status of computational modeling while the sixth section discusses some examples of theoretical studies. Future prospects are discussed in the seventh section. The examples discussed are from papers published from 1990 to 1997. No attempt has been made to review the literature exhaustively; rather, illustrative examples are discussed. The examples discussed are necessarily rather specific; however, we have attempted to point out their general significance. Electrostatic effects in proteins have been the subject of a number of recent reviews. Experimental results and theoretical approaches have been reviewed by Matthew and colleagues( 1985), Allewell and Oberoi (1991), and Nakamura (1996). Theoretical approaches and applications have been reviewed by Warshel and Russell (1984), Rogers (1986), Harvey (1989), Davis and McCammon (1990), Sharp and Honig (1990), Tomasi and Persico (1994), and Honig and Nicholls (1995). Two recent reviews emphasize the role of electrostatics in protein folding and stability (Dill and Stigter, 1995; Honig and Yang, 1995).
BASIC PRINCIPLES Ionization and pKa Values
Every ionizable group can exist in both the protonated and unprotonated form (HA and A"). HA and A" are in equilibrium: HA <^ H+ + A" Ka, the acid (hence "a") dissociation constant for this reaction, is defined as: Ka= [A"][H+]/[HA] The pKa of a group is defined as: pKa= -Iog10Ka When the pH of a solution is equal to the pKa of this group, [A"] = [HA]. The larger the acid dissociation constant, the more acidic the group and the greater its tendency to release its proton; the larger the pKa value, the more basic the group and the greater its tendency to retain its proton. Ionizable Groups in Proteins
The structures of the seven amino acids in proteins that have ionizable side chains are shown in Figure 1 together with the pKa value of the ionizable group in the sidechain of the free amino acid. In addition, the terminal amino and carboxyl
Electrostatic Effects in Proteins
1 X
65
aromatic
sulphur containing DH
\
I
NH +
C0
3
\
negatively charged
NH3+
\ X \
\
?
NKj+ COO" Lysine (Lys, K)
\
pK=10.8
NH3+
,'c.
NH N C=NH2+
Icf
COOArginine (Arg, R)
NH2
pK=12.5
o
NH3+ COOAspartic acid (Asp, D)
X
\
\
\
\i \
CH 2 ~CH 2
'c/
NHj* COOGlutamic acid (Giu, E)
X \
NH^COO'
1
pK = 3.9
\ pH2CH2-CH2
1
pK = 8.3
X
^CHa" CH2 CH^CH,
H,
NH3+ COOCysteine (Cys, C)
°"
pK=10.9
positively charged
CH^SH
', /
Tyrosine (Tyr, Y)
\
^
H
1
pK = 4.3
U N H
Histidine (His, H) x
pK = 6.0
AMINO ACIDS WITH IONIZABLE SIDE CHAINS
\
Figure 7. Molecular structures of the seven amino acids that have ionizable side chains, along with their pKa values.
ALLEWELL, OBEROI, HARIHARAN, and LICATA
66
groups in a protein are ionizable; in small unstructured peptides, their pKa values are typically ~8 and -3.5, respectively. As discussed below, the pKa values of all these groups in proteins may differ from those in free amino acids and small peptides by several pH units as a result of the influence of other charged groups in the protein or because the group is in the mostly nonpolar environment of the protein. Titration Curves and Isoelectric Points
A typical titration curve for a protein is shown in Figure 2. As the protein is titrated from low to high pH, its carboxyl groups will typically lose their protons first, changing their charge from neutral to negative. The last groups to lose their protons
forward titration back titration
B -5
(0 Q) C
o p -10
-15
8 pH
10
11
Figure 2. Forward and backward pH titration curve of ribonuclease (RNase). The number of protons bound or released by the protein is measured as one adds acid or base to change the p H . This is done by titrating two samples, one with and one without protein. Any difference in the amount of acid or base needed to titrate the protein+solution versus the solution to a particular pH is due to the binding or release of protons by the protein.
67
Electrostatic Effects in Proteins
will generally be the guanidino groups of arginine, changing the charge on these groups from positive to neutral. Sidechains with intermediate pKa values will titrate at intermediate pH values; the exact sequence in which they titrate will depend upon their actual pKa values within the protein. The pH at which the protein has no net charge in distilled water is called its isoelectric point (pi). There will probably be many positive and negative charge on the protein at this pH; however, their sums will be equal. The isoelectric point varies with ionic strength as a result of bound ions. The pH at which the net charge on a protein in a salt solution is zero is called its isoionic point. Dipoles in Proteins
When a pair of atoms linked by a covalent bond have different affinities for electrons (electronegativities) they share electrons unequally, creating a dipole. For example, the oxygen atom of the peptide carbonyl group has a partially negative charge, while the carbon atom of this group has a partially positive charge. Similarly, the nitrogen atom of the peptide imide group has a partially negative charge, while the hydrogen of this group has a partially positive charge. These partial charges contribute to the overall electrostatic properties of the protein. For example, because the carbonyl and imide groups of a-helices all point in the same direction, the entire helix has a net dipole, with a partially positive charge at the amino terminus and a partially negative charge at the carboxyl terminus. Hence, in isolated helical peptides, the helix dipole enhances the charges on the terminal amino and carboxyl groups. In proteins, each end of a helix will have a net charge of about one-half unit charge. The magnitude of the dipole effect increases up to a helix length of about seven residues, then begins to plateau (Hoi et al., 1978). Dielectric Constant
The dielectric constant of a medium is a measure of its ability to shield charges and is given by: c_p
vacuum
/p
medium
where Evacuum is the energy of interaction between two charges in a vacuum and ^medium *s m e corresponding energy in the medium of interest. Water has a dielectric constant of 80, because it is a highly polar molecule that shields charges well. When an ion is surrounded by water, the water molecules orient themselves so as to neutralize its charge. For example, the water molecules around a cation such as the ammonium ion will orient themselves so that the partial negative charges on their oxygen atoms partially neutralize the charge on the ion. Conversely, the water molecules around an anion such as chloride will be oriented with their hydrogen atoms near the ion.
68
ALLEWELL, OBEROI, HARIHARAN, and LICATA
The nonpolar interior of a protein has a lower dielectric constant; estimates range from 4 to 20. When a charged group is buried in the nonpolar interior of a protein, its charge is no longer neutralized by the environment. As a result, groups that are charged in their protonated state, such as the imidazole group of His or the amino group of Lys, lose their protons more readily than they do in water, and their pKa values decrease relative to model compounds. Conversely, groups that are charged in their unprotonated form, such as the carboxyl groups of Asp and Glu, ionize less readily than model compounds; hence their pKa values increase. Factors I n f l u e n c i n g the pK a Values of l o n i z a b l e G r o u p s in Proteins
Apparent pKa values determining the charges on ionizable groups can be estimated experimentally, and understanding the factors that influence them and how to model computationally those factors is central to understanding the electrostatics of proteins. The pKa value of a group depends upon its intrinsic pKa (its pKa when totally exposed to solvent) and the electrostatic potential it experiences. The electrostatic potential in turn depends on three factors: the degree of exposure to
pH2
pH6
pH 11
Figure 3. The electrostatic potential surface around a c:r subunit pair in E. coli aspartate transcarbamylase at pH (a) 2, (b) 6, and (c) 11. The potential surfaces change dramatically with p H . At pH 6 one can clearly see the regions of positive potential concentrated in two areas: the active site (the lower area) and the nucleotide effector binding site (the upper area). Since the ligands for these sites are negatively charged, it is believed that the combination of the positive potential at the sites and the negative potential across the rest of the molecule may help guide the ligands to the site. Once the ligands are at the sites, they form noncovalent electrostatic bonds with the protein. This figure was produced with the program DAMPS (H. Oberoi and N.M. Allewell, unpublished).
Electrostatic Effects in Proteins
69
solvent of the group in the folded protein, its interaction with dipoles in the protein, and its interaction with titrating charges. The first two factors are often called the self-energy terms. The quantitative evaluation of these factors is discussed in the fifth section. Potential Surfaces
The distribution of positive and negative charges in a protein is frequently asymmetric. This nonrandom distribution generated an electrostatic potential surface around the protein that may facilitate interactions with other macromolecules and binding of ligands, enhance rates of catalysis, and modulate redox potentials. Molecular graphics has been extensively used to visualize and analyze these surfaces; for an example, see Figure 3. The theoretical approaches used to calculate potential surfaces are discussed in the fifth section; applications are discussed in the sixth section.
EXPERIMENTAL APPROACHES Ion Exchange Chromatography
Ion exchange chromatography is a widely used procedure for protein separation and purification that depends upon electrostatic interactions with ionizable groups attached to a solid support or resin. The strength of the interactions with the resin varies with the sign and total charge on the protein, making it possible to elute a mixture of proteins differentially (Figure 4). The most widely used solid supports are cellulose, dextran, agarose, acrylamide, and polystyrene with a hydrophilic polymer coating. They can be rendered chemically reactive, allowing different functional groups to be attached. They are also hydrophilic, so that they tend not to denature proteins. The most widely used functional groups are shown in Figure 5. Some are negatively charged near neutral pH and hence bind positively charged proteins, and others are positively charged near neutral pH and hence bind negatively charged proteins. The diethylaminoethyl group is the most widely used cation and the carboxymethyl group is the most widely used anion. In an ion exchange experiment, the resin is first washed with an equilibrating buffer until the resin no longer binds or releases ions. The resin is then said to be equilibrated with the buffer and is now ready to exchange its equilibrated ions for other charged molecules such as proteins. The resin is then poured into a column fitted with tubes at the bottom and top that lead to a fraction collector and a solvent reservoir, respectively. When the sample is applied, proteins that do not interact with the column will pass through in the void volume, while proteins that interact with the column due to their charge will stick completely or be retarded. Proteins that stick to the column can be eluted by altering the pH of the buffer flowing
ALLEWELL, OBEROI, HARIHARAN, and LICATA
70 ABSORBANCE AT 280 NM
PH
If 0
100
9
7
200
ELUTION VOL [ML] Figure 4. An example of a typical elution profile from an anion-exchange column in the latter stages of a protein purification. As the pH of the buffer flowing through the column is increased, the resin "exchanges" the proteins for anions (OH") in the buffer. The order of elution is proportional the negative charge on the proteins, with protein "A" being the least negatively charged and thus the easiest to exchange out. The arrows on the x-axis mark the elution volume of the peak of each protein.
through the column, so as to reduce the charge on the protein. Alternatively, or in addition, the ionic strength of the buffer may be increased so as to weaken the interaction of the protein with the resin. More detailed discussions of the principles and practice of ion exchange chromatography can be found in the review article by Rossomando (1990). Electrophoresis a n d Isoelectric Focusing
Electrophoresis and isoelectric focusing both depend upon movement of charged molecules in an electric field. Experiments are typically carried out by applying a small volume of sample to the top of a solid support and driving the proteins through the support by applying an electric field. Because the charge on the various proteins in the sample differ, different proteins move through the support at different rates. When the experiment is completed, proteins can be visualized by staining then with a colored dye (generally Coomassie blue) or by precipitation with silver nitrate (Figure 6). The solid support or gel is generally polyacrylamide, crosslinked with methylene bisacrylamide, in a reaction catalyzed by ammonium persulphate and N,N,N',N'tetramethylethylenediamine (TEMED) (hence polyacrylamide gel electrophoresis, or PAGE). The pore size of the gel can be adjusted by varying the concentration of acrylamide; the average pore size should be roughly half the average size of the protein molecules being separated. The gel is generally cast between glass plates
Electrostatic Effects in Proteins
DESIGNATION
I
71
IONIZABLE GROUP
MATRIX AVAILABLE
Anion exchangers
I
Aminoethyl-
- O - CH2 - CH2 -NH2
Fibrous cellulose
I
I
Diethylaminoethyl-
- O - CH2- CH2- N(CH2CH5) 2
Fibrous cellulose, microgranular cellulose, Sephadex, polyacrylamide gel
I I I I
Fibrous Cellulose
I
Triethylaminoethyl-
- 0 - C H 2 -CH2-N(C2H5)3
T riethylaminoethyl -
- 0 - C H 2 -CH2-N(C2H5)3 NH II
Guanidoethyl-
-O-CH2-CH2-N-C-NH2
Fibrous cellulose, Sephadex
I I
ECTEOLA-
Mix
Fibrous cellulose
I
Fibrous cellulose, microgranular cellulose, Sephadex, polyacrylamide gel
I I I I
Fibrous cellulose
I
Fibrous cellulose Sephadex
I I
Cation exchangers Carboxymethyl-
O - CH2 - COOH
Phospho-
0 II - 0 -P - OH II O 0 II
Sulfoethyl-
- 0 - CH2 - CH2 - OS - OH II O
Figure 5. Widely used functional groups for ion exchange chromatographic experiments.
as a slab with several wells so that several samples can be run at one time. When separations must be achieved within minutes, capillaries are used. Electrophoretic separations are carried out in buffers of low ionic strength to maximize the differences in mobility of the proteins being separated and to maintain a low current so that the gel does not overheat. Buffer systems that can be used at
72
ALLEWELL, OBEROI, HARIHARAN, and LICATA
Figure 6. A gel electrophoretic separation of the proteins present in various subfractions during the preparation of ornithine transcarbamylase(OTCase) from E. coli. Each band contains at least one protein species. The intensity of the band staining is roughly proportional to the amount of that species that is present. The rightmost lane shows significant enrichment in OTCase (the large, dark band).
different pH's have been developed. Another type of electrophoresis involves addition of the detergent sodium dodecyl sulphate (SDS), which dissociates multisubunit proteins and generates denatured protein-SDS complexes that migrate with a mobility that is inversely proportional to the logarithm of their molecular weights. The SDS coats the surface of the denatured protein so that the charge of the protein is masked and the amount of SDS bound is proportional to the surface area of the denatured protein. SDS-PAGE is widely used to determine approximate molecular weights by comparing the mobility of proteins of unknown molecular weights with that of molecular weight standards. Isoelectric focusing separates proteins on the basis of isoelectric point, rather than charge, and depends upon the fact that the electrophoretic mobility of a protein is zero at its isoelectric point. A pH gradient is first created within the gel by electrophoresing a complex mixture of polyampholytes (molecules with several ionizable groups) of low molecular weight. The most acidic polyampholytes concentrate near the anode and the most basic near the cathode creating a pH gradient across the gel. The sample is then applied to the gel and each protein migrates in the electric field until it reaches the point in the gel where the pH is equal to its isoelectric point, where it becomes stationary. At the conclusion of the experiment, the positions of the various proteins in the sample can be determined by staining and their isoelectric points can be determined by comparing their positions with those of standards with known isoelectric points. More detailed
Electrostatic Effects in Proteins
73
discussions of electrophoresis and isoelectric focusing are available in reviews by Garfin (1990a and b). Dependence of Global Properties on pH
Because the ionizable groups that are involved in protein folding, stability, and function can be titrated by varying the pH, many investigators have attempted to identify and characterize these groups by examining the effects of pH on stability or function. Typically in these studies, conformation, assembly, ligation, or enzymatic activity is monitored as a function of pH and the data are fit to models involving various numbers of ionizable groups. This approach was first used to define the ionizable groups involved in substrate binding and catalysis in enzymes. The idea behind such studies is that if one observes, for example, an increase in substrate binding as one increases pH, and if the midpoint of that increase occurs at, for example, pH 8.2, then there should exist a residue with a pKa of 8.2 that is involved in substrate binding. As discussed above, however, many factors influence the apparent pKa's of specific groups so that such an experimentally derived pKa often constitutes a global property of the protein rather than characterizing one isolated residue. In addition, several residues titrating together can produce a single apparent pKa. In carrying out these studies, it is important to use mixed buffer systems in which the ionic strength is independent of pH (Ellis and Morrison, 1982). Many descriptions of the general strategy for such studies exist; see, for example Cleland (1977). Pitfalls in this approach and the particular problems that exist with overlapping kinetically influential ionizations have been discussed by Brocklehurst (1994). Examples of recent studies are described in the fourth section. Site-Directed Mutagenesis
Site-directed mutagenesis has been widely used to test the hypothesis that a particular ionizable group is involved in a given process. If mutating a particular residue eliminates or alters the process of interest, then that residue is at least involved, if not primarily responsible for the process. Directed mutagenesis has also been used to assess the interaction energies of ion pairs, an approach pioneered by Fersht and colleagues (Thomas et al., 1985). In this approach, the interaction between two charged residues is assessed by comparing the effects of mutating each residue individually with the effect of mutating both residues simultaneously. Interaction between charges that are separated by long distances in proteins are generally believed to be facilitated both by the low dielectric of the protein interior and by the global charge interactions discussed above. Regardless of the exact mechanism, whenever site-directed mutagenesis is used, it is important to bear in mind that the effects of the mutation may not be restricted to the site but can propagate to other parts of the structure. Mutational approaches to protein stability were reviewed by Alber (1989).
74
ALLEWELL, OBEROI, HARIHARAN, and LICATA Individual Site Titrations
Nuclear magnetic resonance spectroscopy was first used to determine individual site pKa values 30 years ago. The spectral resonance signals of many residues are pH sensitive so that pKa values can be derived by fitting these data to rather simple, binding equations. Use of NMR eliminates many of the uncertainties that exist when attempts are made to resolve individual site pKa values from global titration curves by fitting the data to models. Although early studies emphasized His residues, multidimensional methods make it possible to determine pKa values for other residues by monitoring the chemical shifts of amide and aliphatic protons. Correct assignment of resonance signals to the residues that produce them is critical. Signal assignment is often accomplished by site-directed mutagenesis. When the resonance that is being monitored is not the titrating group itself, but is, for example, a nearby residue that experiences a change in its environment due to the change in pH, identifying the group that actually did titrate can also be challenging. Nevertheless, this is an extremely powerful approach that is becoming widely used. Linkage Theory
All of the processes that a protein undergoes—ligand binding, conformational changes, changes in state of assembly—are equilibria and are thermodynamically coupled to one another, so that the position of any one equilibrium is affected by all the others. For example, ionization reactions are coupled to most if not all of the other processes proteins undergo. This is, of course, the basis of the dependence of protein stability and ligand affinity on pH and ionic strength. Conversely, the ionization behavior of titratable groups on a protein depends on the conformation and molecular dynamics of the protein, the state of ionization of every other group, the ligands and ions that are bound, and the protein's interaction with other macromolecules. The theory of thermodynamically linked processes in macromolecular systems was developed by Wyman (1964), Weber (1975), Gill (Wyman and Gill, 1990), Ackers (Ackers and Halvorsen, 1974), and others and has been the subject of numerous monographs and reviews; c.f. Cantor and Schimmel (1980). Linkages between proton and ion binding are discussed by Garcia-Moreno (1994).
EXAMPLES OF EXPERIMENTAL STUDIES Catalytic Mechanisms
The catalytic subunit of aspartate transcarbamylase provide an instructive example of the complexities that can be encountered when attempting to identify ionizable groups involved in a catalytic mechanism using pH studies. Aspartate transcarbamylase is an extensively studied allosteric enzyme that catalyzes transfer of the carbamyl group of carbamyl phosphate to the oc-amino group of L-Asp. The
Electrostatic Effects in Proteins
75
catalytic subunits lack the allosteric properties of the holoenzyme but retain catalytic activity and can be purified and studied in isolation. Leger and Herve (1988) studied the pH dependence of catalysis and binding of competitive inhibitors and concluded that binding of carbamyl phosphate involves a group with a pKa of 8.2 that shifts to 7 upon binding of this substrate. Binding of L-Asp was concluded to involve a group with a pKa of 7.2, that shifts to 9.4 when L-Asp binds. Such shifts in pK upon binding occur in many systems. Two groups with pKa's of 7.2 and 9.5 are also involved in catalysis (Leger and Herve, 1988; Turnbull et al., 1992). Since several of these pKa values are similar, the question arises, how many different groups on the protein are involved? Turnbull and colleagues (1992) addressed this question by studying the temperature dependence of the pKa values (a process that allows enthalpies of ionization to be determined) and the pH dependence of binding substrate analogues. Since the group with a pKa of about 9 in the binary complex and the group with a pKa of about 7 in the ternary complex have temperature-independent enthalpies of ionization, the simplest deductive hypothesis is to assume that they are the same. The groups with pKa's of about 7 in the binary complex and about 9.5 in the ternary complex are also postulated to be the same. This reasoning thus yields a model in which the number of groups in the protein involved in binding and catalysis is less than the total number of pKa values observed. Turnbull and colleagues (1992) also concluded that the catalytically productive complex with carbamyl phosphate exhibits reverse protonation (i.e., the group with a pKa of 7 is protonated, whereas the group with a pKaof 9.1 is unprotonated). This means that less than 1 % of the enzyme is in the correct protonation state for catalysis at the pH at which the enzyme is most active. In contrast, the protonation states of residues in the ternary complex with carbamyl phosphate and L-Asp are consistent with their pK values. A parallel study of the holoenzyme and several single-site mutants indicates that reverse protonation cannot generate the observed pH dependence of this form of the enzyme (Yuan et al., 1996). Holoenzyme pH dependencies indicate that the mechanism of productive binding of L-Asp the binary complex (with carbamyl phosphate) differs for the catalytic trimer versus the holoenzyme, while the Michaelis or ternary complex (enzyme + both substrates) appears very similar for both assembly states. In addition, thermodynamic cycles were used in this study to demonstrate long-range coupling between titrating sites and two different subunit interfaces in the enzyme. The requirement that thermodynamic cycles must close is a powerful constraint that may be useful in developing models from pH studies in many systems. Long-Range Interactions
Electrostatic interactions persist over longer distances than other interactions influencing protein folding and function because their dependence on distance is
76
ALLEWELL, OBEROI, HARIHARAN, and LICATA
proportional to (r) and because the low dielectric medium within the protein enhances their strength. Interest in their potential roles in long-range effects in proteins has been increasing and a number of recent studies have addressed this issue. Loewenthal and colleagues (1993) evaluated long-range coulombic interactions between surface charges in the proteins barnase and subtilisin and compared their experimental results with the predictions of the DelPhi computational package (discussed in the fifth section). The pKa value of a His residue in each protein was evaluated experimentally, by monitoring the fluorescence of a Trp residue in barnase that is quenched by titration of the His, and by determining the pH dependence of kcJKm for subtilisin. The coulombic interactions of distant charged side-chains with the protonated forms of the His residues were measured from the changes in the pKa's of these residues upon mutation of the distant, charged sidechains. Interaction energies were small, in the range of 0.3 to 0.5 kcal.mol-1 at 12 A, and fell off with distance. Multiple mutations were frequently additive. Effects were larger in subtilisin than in barnase, possibly because of reduced solvent exposure. Hence, the magnitudes of these effects might be expected to increase for more buried charges. Agreement between the experimental results and the predictions of DelPhi were reasonable. Long-range electrostatic interactions that affect the catalytic activity of cysteine proteinases have been demonstrated with procaricain from E. coli by Taylor and colleagues (1994). The catalytic activity of this enzyme depends upon a His-Cys ion pair at the active site; however, the pH dependence of the catalytic parameters is complex implying the involvement of other groups. Yuan and colleagues (1996) examined the effects of mutating two tyrosine residues involved in a large cluster at the interface between catalytic subunits about 15 A from the active site of E. coli aspartate transcarbamylase. This interface is disrupted in the T—>R transition of the molecule. The finding of substantial quantitative differences between the effects of the two Tyr—>Phe substitutions at sites nearly equidistant from the active site demonstrated that electrostatic communication between this interface and the active site is path-dependent, rather than simply depending on distance. Site-Directed Mutagenesis
A study by Bakir and colleagues (1993) in which cassette mutagenesis was used to probe the catalytic and pH dependence of glucoamylase illustrates the use of site-directed mutagenesis to probe enzyme mechanism. Nine amino acids at the active site were mutated. Three of these were Asp—»Glu mutations or vice versa, four replaced uncharged residues with Asp, and two inserted Asp or Gly. Several mutations produced large changes in kcat or Km, indicating that the mutated residues are involved in either catalysis or substrate binding. Despite changes in the overall charge at the active site, effects on pH dependence were small, leading the authors to conclude that "modifying enzyme pH behavior by mutagenesis is still an
Electrostatic Effects in Proteins
77
unpredictable endeavor." This conclusion underlines the importance of determining three-dimensional structures of mutant proteins, of defining the structural changes produced by the mutation, and of investigating the physical basis of the functional effects of the mutation thoroughly in solution: that is, there are no simple answers. Barker and colleagues (1991) examined the effects of several mutations in yeast iso-1-ferricytochrome c on its titration behavior. As is often the case, the titrations were not completely reversible, probably because of structural changes in the protein at extreme pH. In this case, titrating from high to low pH was reversible, while titrating in the opposite direction was not. The titration behavior of the protein was also temperature-dependent. This was attributed to temperature-dependent pK a 's, although temperature-dependent conformational changes in the protein, perhaps linked to changes in pKa's, are another possibility. This study also illustrates the use of difference titrations to define pKa shifts. Single-Site Titrations
Use of NMR to titrate individual residues is illustrated by a study of human thioredoxin carried out by Forman-Kay and colleagues (1992). Although more than two hundred chemical shift titration curves were measured over a wide pH range, pKa values could be calculated only for a single His, the two active site Cys residues, and a number of Asp and Glu groups. Many groups did not titrate completely and many titration curves were complex because of interactions between titrating groups, long-range electrostatic effects and pH-induced conformational changes. These complexities limit the extent to which these data can be interpreted and provide an instructive illustration of the difficulties of extracting specific molecular information from the experimental approach at its present level of development. Determining Surface Charge
The interactions of a protein with other molecules are likely to be strongly influenced by its surface charge. Computing the electrostatic potential surface is a popular approach to predicting and analyzing these interactions. Experimental measurements, however, are also important because three-dimensional structures are not available for many proteins and the computational predictions need to be verified experimentally in any case. Alexiev and colleagues (1994) have developed an approach that may be applicable to many other systems. They attached the optical pH indicator fluorescein to cysteines introduced at selected sites in bacteriorhodopsin by site-directed mutagenesis, determined its pKa by titration, and used the pKa value to calculate surface charge density. When the surface charge was perturbed by site-directed mutagenesis, the experimentally determined surface charge shifted as expected. The specific residues that contribute to the surface charge in the wild-type protein were identified by site-directed mutagenesis. Another new and exciting approach is the use of protein charge ladders (Gao et al., 1996). A family of different charge isomers of a protein is constructed using random, limited
78
ALLEWELL, OBEROI, HARIHARAN, and LICATA
acetylation of surface lysines. Charge isomers are then separated by capillary electrophoresis, and the ligand binding affinities of the charge isomers reflect the surface charge contributions to ligand binding. Interactions with Lipids
Interest in electrostatic effects in protein-lipid interactions has been increasing and is the subject of several recent theoretical papers discussed in the sixth section. Here we discuss an experimental paper. Monette and Lafleur (1995) investigated the role of charge-charge interactions in the lytic activity of mellitin, a positively charged peptide, which breaks lipid membranes up into small lipid-peptide particles. Formation of these particles was monitored by 31P-NMR spectroscopy in membranes containing various amounts of charged lipids. The amount of mellitin bound at the surface of the membrane increased as the negative charge on the membrane increased, but the amount of lysis decreased. The authors propose that lysis requires electroneutrality at the surface. When the negative charge on the membrane is high, large amounts of mellitin are bound, but electroneutrality is never achieved. Since electrostatic models for membranes are fairly well developed, it would be possible and desirable to model this hypothesis computationally.
THEORETICAL APPROACHES Macroscopic versus Microscopic Models
All of the theoretical models that have been developed to analyze electrostatic effects in proteins fall into one of two broad categories. Microscopic models treat explicitly each atom in the protein and each solvent molecule and ion in the surrounding solution. Macroscopic or continuum models describe the properties of groups of molecules or ions in terms of averages. The earliest models were macroscopic; however, as computational power has increased, it has become possible to incorporate more and more atomic-level detail. Nevertheless, the most widely used models at this time still treat the solvent as a continuum. There are several problems associated with modeling electrostatic effects accurately. The fact that each ionizable group on a protein interacts with every other group presents a formidable computational challenge. Electrostatic effects are also medium-dependent, and boundary effects exist at the interface between two media, for example, between the protein and the solvent. Because biological macromolecules bind many ions and are conformationally flexible, the results are very sensitive to the ionic strength and pH of the surrounding solvent. Moreover, the complex interactions between proteins and the surrounding solvent cannot be treated exactly at this point but require that some simplifying assumptions be made about the system.
Electrostatic Effects in Proteins
79
The first electrostatic model was formulated by Linderstrom-Lang who treated the protein as a sphere with the net charge distributed uniformly (smeared) over the surface. Under these conditions, the electrostatic free energy of the protein is proportional to the square of the net surface charge (Linderstrom-Lang, 1924). This model predicts that molecules of the same protein will repel each other at all pH's except at their isoelectric point. Although it accounts for isoelectric precipitation, it is not consistent with the many observations that indicate that electrostatic effects can play a positive role in self-assembly. Solutions of the Poisson-Boltzmann Equation
Virtually all modern electrostatic models are based on the Poisson-Boltzmann equation (of which Coulomb's Law is a special case). The nonlinear PoissonBoltzmann Equation is given by: V-[(e(r)V<|>(r)]- K2(r)sinh[(|>(r)] + 47ip(r) = 0 where (|)(r) is the dimensionless electrostatic potential expressed in units of kT/q, k is the Boltzmann constant, T is the absolute temperature, q is the charge on a proton; e is the dielectric constant, p is the charge density (in units of proton charge), and r is a position vector (McQuarrie, 1976). K= e1/2K, where K is the modified Debye-Huckel parameter and 1/K is the Debye screening distance, given by the equation: K2 = Snq2I/EkT where / is the ionic strength. Early treatments linearized the sinh term, making the approximation that sinh[0(r)] = ())(r). This approximation cannot be used with highly charged molecules and at high ionic strengths, and modern computers have eliminated the need for it. Even the nonlinear Poisson-Boltzmann equation does not accurately describe systems in which movements of small ions are highly correlated. For further discussion of the history and limitations of the Poisson-Boltzmann equation, see Dill and Stigter (1995). Prior to the 1970s, all applications of the Poisson-Boltzmann equation to biopolymers involved fairly simple models that could be solved analytically. The model of proteins that was most widely used is schematically illustrated in Figure 7; the analytical solution was derived by Tanford and Kirkwood (1957). It consisted of a low dielectric sphere with point charges at a fixed distance beneath the surface, surrounded by an ion exclusion shell and embedded in a high dielectric medium. Pairwise interaction energies are calculated with distances derived from crystal structures; the treatment of solvation (Born) energies is approximate and chargedipole interactions are neglected. Even with an analytical solution, the calculation must be carried through hundreds of iterations since each charge interacts with every other charge.
ALLEWELL, OBEROI, HARIHARAN, and LICATA
80
A
B
Figure 7. Schematic depictions of analytical models for solution of the PoissonBoltzmann equation. (A) The Tanford-Kirk wood model for the protein solvent interface. Area I in the inner sphere defines the low dielectric of the protein, lonizable residues are modeled as point charges within (i) and on the surface (j) of this region. Region II, the second spherical shell, defines the ion exclusion layer. This is a region of high dielectric. The external or bulk solution, region III, also has a high dielectric and contains mobile ions (Tanford and Kirkwood, 1957). (B) The States and Karplus model (Delepierre et al., 1987) does not include the ion exclusion layer.
In early applications of the Tanford-Kirkwood model, the distance at which the charges were placed beneath the surface was used as an adjustable parameter (the burial factor), selected to optimize agreement with experimental protein titration curves. Development of methods for calculating solvent accessibility of individual atoms from crystal structures (Lee and Richards, 1971) paved the way for replacing this average, adjustable burial factor with a specific, structurally determined burial factor for each group. Modified Tanford-Kirk wood theory had several very attractive features. Calculations could be performed readily and it was formulated in such a way that the calculations provided directly a complete set of pKa values. These pKa values could be readily compared with experimental results, generally agreed well with them, and could be used to develop models of biochemical function. Nevertheless, this approach has at least three obvious limitations. Because the protein is modeled as a sphere, the complex dielectric boundary between the protein and the solvent is neglected. Since this boundary is critically involved in molecular recognition and binding, it is likely to be important, particularly since the shape of the dielectric boundaries makes a substantial contribution to the electrostatic potential surface in
81
Electrostatic Effects in Proteins
simple systems. In addition, charge-dipole interactions are ignored and solvent accessibility is introduced in a way that does not have a sound physical basis. As more powerful computers became available in the 1980s, developing more complex models and solving the Poisson-Boltzmann equation iteratively by numerical methods, rather than analytically, became feasible. The most widely used numerical method is the finite difference method, which was introduced by Warwicker and Watson in 1982 and is the basis of a widely used commercially available package, DelPhi, marketed by Molecular Simulations Inc. In this approach, the molecule is mapped onto a three-dimensional grid the mesh of which can be adjusted. Ionizable atoms are assigned to grid points and the electrostatic potential at each grid point is calculated using the finite difference approximation of the Poisson-Boltzmann equation,
w
c =
XM>,- + 4 7 t *o / A
=——
£ E. + 0. + K2hs2 [1 + 4>g/3! + <|>j}/5! + ... ^ n /(2n + 1)!]
where the nonlinear term is represented as a series, h is the grid spacing in A, (j)0 is the electrostatic potential at the central grid point, qQ is the charge at this grid point, and the summations are over the six neighboring grid points (i = 1-6) (Jayaram et al., 1989; Warwicker and Watson, 1982; Klapper et al., 1986). The accuracy of the potentials obtained from these calculations is highly dependent on grid spacing. The time required for the calculations, however, increases steeply with the number of grid points. One approach *o reducing the time is focusing (Gilson et al., 1989), in which the mesh of the grid is reduced only in the vicinity of ionizable groups of particular interest with potentials from coarser grids used as initial guesses. A more powerful approach is the multigrid method, in which the solution on a given grid (generally the finest grid), is obtained by iterating over a hierarchy of coarser grids (Oberoi and Allewell, 1993; Oberoi et al., 1995; Hoist and Saied, 1993; Hoist et al., 1994a and b). The key advantage of this technique is that the accuracy of the solution is iteratively improved by solving the problem on the coarser grids where the computational cost is low with infrequent visits to the finer grids where the computational cost is high. This approach is equally applicable to the focusing technique. Recently developed approaches based on the boundary element method (Zhou, 1993) and the finite element method (You and Harvey, 1993) provide a more accurate description of the macromolecular surface than the cubic grid used in finite difference solutions. In deciding which algorithm to use, serious consideration must be given to evaluating the balance between the necessary accuracy and precision and the computational time involved in the calculation (Oberoi and Allewell, 1993). The parameter that is calculated when the Poisson-Boltzmann equation is solved by numerical methods is the electrostatic potential. Several computer graphics packages that allow the potential to be displayed and analyzed are available; the
ALLEWELL, OBEROI, HARIHARAN, and LICATA
82
most widely used is GRASP (Nicholls et al., 1991). The potential surfaces of many proteins have been shown to be involved in molecular recognition, binding of proteins to nucleic acids and lipids, binding of substrates to enzymes, and proteinprotein interactions. Several examples are discussed in the sixth section. Calculating Free Energies and pKa Values
Formulating atomic-level models of molecular function requires knowledge of the state of ionization of individual residues, which in turn required knowledge of pKa values. Until 1990, methods that made it possible to calculate complete sets of pKa values with grid models were not available. However, in 1991, Bashford and Karplus described a successful approach based on prior work by Jorgensen (1989). The free energy of ionization of a titratable group in a folded protein is related to the free energy of ionization of the same group in a model compound (generally a dipeptide) by the thermodynamic cycle: AG™
ion
Hc
M<
MSH
AQMH
AGM
transfer
transfer
1 MpH AGP
ion
Scheme / .
where M^H is the protonated group in a model compound in solution, M H is the protonated group in the protein, and Ms and M are unprotonated forms of the same species. AAGion, the difference in the free energies of ionization of the titratable group in the folded protein and the model compound, is in turn equal to AAG^^fo, the difference in free energies of transferring the protonated and unprotonated forms of the group from the model compound to the protein. AAGtransfer in turn can be broken down into three terms: 1) AAGBorn, the free energy change due to changes in solvation of the group 2) AAGdi le, the free energy change due to interactions between the charged form of the group and dipoles in the protein
Electrostatic Effects in Proteins
83
3) AAGsite_site, the free energy change due to interactions with the charged forms of other titratable groups. AAG,ram/£r = AAGBorn + A A G ^ + AAGsite_site = AGMHtransfer - AGMlransfer Each of the terms in bAGtransrer can be calculated from the same cycle shown above by turning off dipoles and/or other site charges during the calculation. A one-unit change in the pKa of a residue corresponds to a AAGtransrer of only 1.3 kcal-mol-1. Each of the terms of AAGtransj-er, however, may have a magnitude of tens of kcal-mol"1. This summing of several large, compensating free energies to obtain a small overall free energy is one source of error in these computations. As a result, calculated pKa values may differ from values derived experimentally by many pH units. Differences between the structure used in the calculation and the average structure of the molecule in solution is a major source of error in the calculation of the components of AAG/rans,er and thus in the calculation of individual site pKa's. Hence, beginning with high-resolution, well-refined structures is crucial. The results of the calculations are also sensitive to the atom in the group to which the charge is assigned (e.g., which nitrogen in a guanidino group is assumed to carry the ionizable proton). If the charge is assigned to one atom, both solvent accessibility and the proximity of other charges should be considered in making this assignment (Oberoi and Allewell, 1993). Recent calculations incorporate all possibilities (c.f., Bashford et al., 1993). Ultimately, it will also be desirable to incorporate the effects of molecular fluctuations by deriving the time-averaged structure using molecular dynamics. Calculating Titration Curves
When pKa values have been calculated, titration curves can by calculated simply by applying the Henderson-Hasselbach equation to each titratable group: log[A-]/[HA] = p H - p K a This approach was first used by Tanford and Roxby (1972). Alternative statistical mechanical approaches has been developed by Bashford and Karplus (1991; Bashford and Gerwert, 1992) and by Karshikov (1995). The fractional protonation of each ionizable group is calculated with a Boltzmannweighted scheme: y
e,-«-y 1
^-PAGW-Kx^JtBpH ^-PAG(x) - v/(x)2.303pH
{x}
where {x} is a particular protonation state, AG(x) is the energy of the reaction: Protein + v(x)H —» PHv(x), where PHv(x) is the protein in protonation state x, v(x) is the number of protons added to bring the molecule to state x, and P = 1//:T.
84
ALLEWELL, OBEROI, HARIHARAN, and LICATA
This approach is much more computationally expensive than the Tanford-Roxby approach and is therefore generally used only when dealing with strongly interacting groups. A number of techniques have been developed that reduce the computational requirements of this summation (Beroza et al., 1991; Gilson, 1993; Yang andHonig, 1993).
EXAMPLES OF THEORETICAL STUDIES Protein Stability
As a result of current widespread interest in protein folding, the contribution of electrostatic effects to protein stability has become an area of emphasis. The contribution to both the unfolded and folded protein must be evaluated since AAGfoldi is the free energy difference between the folded and unfolded states. In addition to the theoretical developments discussed below, a number of laboratories have been experimentally addressing this question (Tan et al., 1995; Meeker et al., 1996; Oliveberg and Fersht, 1996). Dili and colleagues have developed a very simple, yet successful model of protein folding, the "heteropolymer collapse" based on principles of polymer chemistry in which the initial step is collapse of the unfolded protein into a compact structure lacking tertiary interactions present in the native protein. Stigter and colleagues (1991) have incorporated electrostatic effects into this model by adding a step at the beginning of the folding process in which the charged, unfolded protein is uncharged, and a step at the end of folding in which the folded protein is charged. The charged, folded protein is treated as a solid sphere with the charge uniformly distributed on the surface, while the unfolded protein is treated as a porous sphere with charges uniformly distributed throughout. The charge on the folded protein is calculated from experimentally derived pKa values, the charge on the unfolded protein by solving the Poisson-Boltzmann equation. This very simple model, which is reminiscent of the Linderstrom-Lang model formulated 50 years earlier, correctly predicts for apomyoglobin the existence of "molten globules" at acid pH, denaturation temperatures versus pH, and stability as a function of ionic strength and pH. Yang and Honig (1993) have also addressed this problem, using lysozyme as the test case. Their pKa values for the folded protein were calculated by the finite difference method for several structures from a molecular dynamics run, beginning with the crystal structure. pKa values in the unfolded protein were assumed to be those of model peptides. The equilibrium constant for unfolding was then calculated as a function of pH by integrating the standard linkage equation: 51nK/51naH+ = AvH+ where AvH+ is the difference in the numbers of protons bound by the folded versus the unfolded protein. They conclude from their results that, although specific ion pairs may be stabilizing, ionizable groups generally destabilize proteins because of
Electrostatic Effects in Proteins
85
desolvation effects. pH-dependent unfolding appears to be due to individual groups with anomalous pKa's whose location on the protein surface may determine the nature of the unfolded state. Warwicker (1994) has analyzed the structural basis of the differential deactivation of polio and rhinovirus at acid pH using the finite difference approach to calculate pKa values. Both charge-charge interactions at the subunit interfaces and the flexibility of (3-strands that shield some of these ionizable groups from the solvent appear to influence stability. Differences in the stability of the two viruses at acid pH are attributed to replacement of an uncharged residue in a P-strand in rhinovirus by an ionizable residue in polio virus. Karshikov and colleagues (1991) used both modified Tanford-Kirkwood theory and the finite difference approach to examine the role of electrostatics in subunit interactions in constitutive phycocyanin from Fremyella diplosiphon. Potential surfaces calculated with both methods were similar, hence site-site interaction energies were calculated only with the less computationally demanding TanfordKirkwood approach. Complementary potential surfaces were shown to promote formation of a p dimers as well as (aP) 3 and (aP) 6 , illustrating the importance of potential surfaces in protein-protein interactions. Estimating pKa Values
Although several comparisons between pKa values derived with modified Tanford-Kirkwood theory and experimental values were made in the 1980s, recent comparisons have depended upon finite difference calculations. The first study of this type was the calculation carried out on lysozyme by Bashford and Karplus (1990) in which it was shown that half of the calculated pKa values were within one pKa unit of the experimental value and that larger differences generally arose from overestimating the pKa shift that results from incorporating the ionizable group into the protein. This was an important step forward since previous applications of the finite difference method generally yielded wildly unrealistic values. Bashford and Gerwert (1992) carried out similar calculations for bacteriorhodopsin and were able to reproduce several large pKa shifts demonstrated experimentally. Their results indicate that large, unfavorable desolvation energies are compensated by strong favorable charge-charge interactions that in turn generate complex titration behavior. They suggest that these features may be characteristic of proteins whose function includes proton transfer. Comparisons between experimental and calculated pKa values for carbon monoxy sperm whale myoglobin constituted one of the principal validations of the modified Tanford-Kirkwood approach. Bashford and colleagues (1993) recently carried out an analogous comparison of pKa values calculated by the finite difference method and determined experimentally by multidimensional nuclear magnetic resonance spectroscopy. A new approach to the treatment of tautomers was proposed in which each His was considered to have two protonation sites, and charge
86
ALLEWELL, OBEROI, HARIHARAN, and LICATA
models, atomic radii, and coordinates were varied in order to test the sensitivity of the results to these parameters. The calculations reproduced the titration behavior of His residues within the protein reasonably well but overestimated pK1/2 values for Tyr by a few pH units. Strong interactions between titrating groups resulted in Hill coefficients of less than one. These results are reminiscent of the experimental results for human thioredoxin of Forman-Kay and colleagues (1992) (third section). Free Energies of Ligand Binding
Current interest in rational drug design has in turn stimulated interest in developing methods for predicting the magnitude of electrostatic effects in ligand binding. While most calculations still use only classical electrostatic theory, they are beginning to be combined with more powerful ab initio approaches as illustrated in a recent analysis of p-hydroxybenzoate hydroxylase (Perakyla and Pakkanen, 1995). Electrostatic contributions to ligand binding calculated by classical methods were combined with interaction energies in the gas phase calculated by ab initio molecular mechanics and desolvation energies calculated with a semi-empirical quantum mechanical approach to provide estimates of binding energies. Agreement between experimental and calculated free energies was quite good; the relationship was linear with a correlation coefficient of 0.9. Free energy simulation techniques have also been used to model protein ligand interactions (reviewed in Kollman, 1993). Free energy simulations have been used to calculate binding free energies in the design of inhibitors of thymidylate synthetase (Reddy et al., 1991) and to calculate relative binding free energies of inhibitors of dihydrofolate reductase (Cummins and Gready, 1993). Free energy perturbation and finite difference Poisson-Boltzmann calculations have been used to estimate relative free energies of hydration (Ewing and Lybrand, 1994) to predict the energetics of protein-protein interactions in solution (reviewed in Stoddard and Koshland, 1993), and to model conformational change in the docking of a putative ligand to a protein receptor site (Leach, 1994). Continuum methods based on the finite difference solution of the Poisson-Boltzmann equation have also been used to calculate free energies and binding constants directly in several systems (cf, Misra et al., 1994; Antosiewicz et al., 1994). An important paper by MacKerell and colleagues (1995) represents a major advance in this area. Their goal was to predict the pH dependence of the relative binding of different ligands to an enzyme. The binding constants calculated were macroscopic constants that included all possible states of ionization of both the protein and the protein inhibitor complex, since microscopic constants defined in terms of individual chemical species generally cannot be determined experimentally. The approach taken was to first calculate the difference in free energy of binding of a pair of ligands for a single protonation state of the enzyme-ligand complex by free energy simulation (reviewed by Brooks et al., 1988; Beveridge and DiCapua, 1989). Other protonation states were included by carrying out Poisson-
87
Electrostatic Effects in Proteins
Boltzmann calculations to determine free energies of ionization from which binding polynomials can be derived. MacKerell and colleagues (1995) illustrate their approach by analyzing the binding of 2/-GMP and 3'-GMP to ribonuclease T r Molecular dynamics and a hybrid potential energy function were used to calculate the free energy differences for converting 2'-GMP to 3'-GMP both in solution and when bound to the enzyme at pH 5, where binding of both inhibitors is strongest. Given the thermodynamic cycle:
AG aq -> E l(aq)
(E)
i->r
AG aq
AG,
t
i->r
t l'(E)
•'(aq)
AG
aq -> E
Scheme 2.
where I and I' are the two inhibitors and where aq and E designate the inhibitor in solution and on the enzyme surface, respectively, the difference in the free energy of binding is given by AG1' _>E - AG1 _>E. Since the total free energy change around the cycle must be zero, this is equal to AGEI-»I' - AG I—>I'. In words, the difference in free energies of binding the two inhibitors is equal to the difference in free energies of converting I to I' in solution and bound to the enzyme. A similar approach is used to calculate the difference in the free energy of binding of each inhibitor to two forms of the enzyme that differ in the state of protonation of a single residue. Here the thermodynamic cycle is:
ALLEWELL, OBEROI, HARIHARAN, and LICATA
88
K
I aq-> E
E +H + I
K
E -1 +H
E -> EHT E
K
EH++ I
E ->EH* El
EH K
aq->EH +
Scheme 3.
The quantity of interest is AGI EH+ - AGI E, which is equal to AG[^EH+ the difference in the free energies of protonating the liganded and unliganded protein. The calculations predict correctly the relative affinities of the two inhibitors for the enzyme, but overestimate the difference in free energy of binding by about 4 kcal.mor 1 . They also correctly predict the magnitude and pH dependence of the limited set of experimentally determined binding constants for each inhibitor. They provide considerable insight into the many factors that give rise to the difference in free energy of binding and they identify with a reasonable degree of certainty the groups that function as the general acid and base in the catalytic mechanism. The discussion of the effect of bound solvent on pKa values is a particularly interesting feature of the analysis. M^H->EH+'
Kinetics of Ligand Binding
Inspection of potential surfaces has often given rise to the suggestion that they could accelerate rates of ligand binding. This hypothesis can be examined by modeling the effects of the potential surface on the Brownian dynamics of the ligand, and calculating its probability of reacting versus diffusing away when ligand and protein are separated by a specified distance (Allison et al., 1985).
Electrostatic Effects in Proteins
89
In acetylcholinesterase, a funnel of negative potential that extends outward from the active site can be envisioned to be involved in catalysis and the first calculations were consistent with this possibility (cf., Antosiewicz et al., 1994). However, a set of mutations that eliminated negative charges had only modest effects on rates of hydrolysis (Shafferman et al., 1994). Subsequent papers by Antosiewicz and colleagues (1995, 1996) demonstrate just how complex the design and interpretation of mutagenesis experiments can be. They show that a number of mutants including those generated by Shafferman and colleagues (1994) have only small effects on calculated encounter rates. However, they show that electrostatic steering is nevertheless important because both varying the ionic strength and eliminating the charge on the substrate affect both measured rates of hydrolysis and calculated encounter rates. A future challenge will be to find mutations that produce larger changes in calculated and experimentally observed rates. Similar attempts to quantitatively understand the role of electrostatics in protein-protein association rates are in progress (Janin, 1997; Schreiber and Fersht, 1996). Enzyme Mechanisms
Very few applications of electrostatic modeling to enzyme mechanisms have been reported as yet, perhaps because of skepticism about the applicability of classical continuum models to catalysis. Warwicker and colleagues (1994) examined the effects of alcohols on phospholipase A2 experimentally and through modeling to test the utility of two refinements in continuum electrostatic models proposed in a previous paper (Warwicker, 1994). The first is a double layer of solvent in which the volume traced out on the protein by a single solvent molecule (with a 1.4 A radius) is assigned a dielectric of 30, as proposed in the smeared dipole model of Onsager (1936), while the volume beyond this inner solvent layer is assigned a dielectric of 80, as proposed by Kirkwood (1939). The second refinement is the use of a saturating dielectric in high electric fields (for example, near-charged side chains) that is adjusted throughout the calculation. The reduction in activity produced by alcohols was shown to result primarily from a destabilization of the transition state rather than from changes in pKa values of groups known to be involved in catalysis. The effects of mutants were also predicted correctly. This example demonstrates that continuum methods can be useful in predicting trends, at least in some cases. The role of electrostatic effects in the catalytic and regulatory mechanisms of E. coli aspartate transcarbamylase has been analyzed by Oberoi and colleagues (1996) using the finite difference method with multigridding. This is the largest system to which this approach has been applied to date. A number of interactions over distances too large for direct ion pair formation were identified and the possibility that these long-range interactions are involved in the allosteric mechanism was proposed.
90
ALLEWELL, OBEROI, HARIHARAN, and LICATA Redox Potentials
The thioredoxin family of proteins has disulphide bonds that can be reversibly oxidized and that enable the proteins to function in redox reactions in cells. The various members of this family have different redox potentials and considerable effort has gone into elucidating the structural basis of these differences. Differences in redox potential are directly proportional to differences in the stabilities of the oxidized and reduced forms of these proteins. Since the thiol groups in the reduced protein are ionizable, interactions between their ionized forms and other groups in the protein could contribute to the difference in stability in the oxidized and reduced forms. Langsetmo and colleagues (1991a and b) established the linkage between thioredoxin stability and the titration of specific amino acids in the protein. AAG = 2.303RTApKa where AAG is the contribution of the titration of an ionizable group to the stability of the protein and ApKa is the change in pKa of that ionizable group produced by the unfolding of the protein. Both an Asp residue near one of the thiol groups and a Lys with which it can interact may be involved in regulating the redox potential of thioredoxin on the basis of the elevated pKa of the Asp residue and the fact that its pKa shifts when thioredoxin is oxidized or reduced (Langsetmo et al., 1991a and b). However, comparison of finite difference calculations on E. coli thioredoxin and DsbA, a homolog containing a thioredoxin domain and a 76-residue insert that is primarily a-helical, indicates other factors may also be important in DsbA (Gane et al., 1995). These include other ionizable residues, some of which are present only in DsbA, interactions with backbone dipoles, and the presence of a low dielectric region near the active site in DsbA. Interactions with Lipids
Potential surface calculations have been used recently in two systems to provide insights into the mechanism of lipid-protein interactions. In the first study, Lakey and colleagues (1994) carried out calculations on several colicins. Colicins are bacterial toxins that kill enterobacteria in a process that requires binding to a receptor on the outer membrane, translocation to the inner membrane, and insertion in the inner membrane. They found that, despite large differences in isoelectric point, the long-range potential surfaces of naturally occurring colicins were similar, with an extensive positive region and a negative dome that probably orients the colicin with respect to the negatively charged membrane. Parallel experimental studies have shown that eliminating several negatively charged residues affects in vitro activity. In a similar study, Scott and colleagues (1994) compared the potential surfaces of several phospholipases A2. All of the potential surfaces had distinct molecular-
Electrostatic Effects in Proteins
91
sidedness. The results were consistent with delocalized molecular electrostatics playing a role in orienting and holding phospholipases A2 at water-lipid interfaces; however, mutational results also implicate hydrophobic interactions. The conclusion that electrostatics is unlikely to be the only factor probably applies to many other systems. Hormone-Receptor Interactions
Demchuk and colleagues (1994) applied the potential surface approach to investigate the role of electrostatic effects in the binding of four-helix bundle growth factors to their receptors. The potential surfaces of hormones that bind to identical receptor subunits have twofold rotational symmetry, despite differences in sequence, while the potential surfaces of hormones that bind to heterooligomeric receptors lack symmetry. Future Prospects
There has been enormous progress in our ability to model electrostatic effects in proteins in recent years. Grid methods have increased the amount of molecular detail that can be incorporated into models substantially, while methods that allow free energies and kinetic constants to be calculated have increased the range of questions that can be addressed. As a result, there is currently intense interest in carrying out parallel experimental and theoretical studies in many systems. Increases in accuracy have lagged far behind increases in the complexity of the models used in theoretical calculations. Further improvements in the finite difference approach are possible, as discussed by Warwicker (1994) and others. The finite element method improves the accuracy with which the molecule is mapped on the grid, although at considerable computational expense (You and Harvey, 1993). The greatest shortcomings of current models are the neglect of molecular fluctuations, ion binding, and the details of solvent structure. Incorporating molecular fluctuations is largely a matter of having sufficient computer memory and time available since molecular dynamics methods are well-developed (cf., McCarrick and Kollman, 1994). Incorporating ion binding, pH effects, and more realistic solvent models is also challenging because the experimental information that is available is limited; however these problems constitute much of the most active current research (Garcia-Moreno, 1994; Warwicker, 1994; Coitino et al., 1995; Sharp et al., 1995; Dimitrov and Crichton, 1997; Alexov and Gunner, 1997; Zhou and Vijayakumar, 1997). Effective models should represent the principal components of the system components: the solvent, ions, and solute (the macromolecule) in sufficient detail. They should also include the dynamic nature of the system, including the motion of all three components. Standard molecular dynamics simulations and free energy simulation techniques alone (reviewed in Beveridge and DiCapua, 1989) are time-consuming and treat the system in a fixed ionization state, while continuum
92
ALLEWELL, OBEROI, HARIHARAN, and LICATA
approaches lack detailed representation and do not represent the dynamic nature of the system. Hence, approaches that combine molecular dynamics simulations with continuum calculations have recently been developed (cf., Gilson, 1995). A recently developed method for sampling potential energy surfaces (Tidor, 1993) (instead of a full free energy simulation) helps reduce the computational cost considerably, Free energy simulations that take into account pH dependence by calculating the protonation state using continuum calculations have also been recently described (MacKerell et al., 1995). Hybrid models that combine molecular dynamics and continuum methods are relatively rapid and have the additional advantage of allowing ionic strength and pH to be included. They can be used to describe the complete energetics of a macromolecular complex as a sum of terms that include changes in the conformation of the system, hydrophobic energy based on changes in solvent accessibility, a continuum electrostatic term, and a covalent term describing the bonded geometry. While electrostatic effects are frequently the subject of experimental studies, very few studies that directly combine experimental and theoretical approaches have been carried out. Examples include Bashford and colleagues (1993) and Warwicker et al. (1994). More frequently, predictions from theory are compared with a limited set of experimental data obtained in a different laboratory for a different purpose. The separation of theory and experiment severely limits the synergy required to produce rapid productive improvements in the theory. Development of close collaborations between theoreticians and experimentalists with a good understanding of the issues involved in both approaches would be very productive. Despite the popularity and successes of electrostatic modeling, it is important to keep in mind that electrostatics is only one of several factors in protein folding, stability, and function. Electrostatics should not be overemphasized in interpreting experimental results, and approaches that allow other factors to be investigated need development.
REFERENCES Ackers, G.K. and Halvorson, H.R. (1974). The linkage between oxygenation and subunit dissociation in human hemoglobin. Proc. Natl. Acad. Sci., U.S.A. 71,4312-4316. Alber, T. (1989). Mutational effects on protein stability. Ann. Rev. Biochem. 58, 765-798. Alexiev, U., Marti, T., Heyn, M.R, Khorana, H.G., and Scherrer, P. (1994). Surface charge of bacteriorhodopsin detected with covalently bound pH indicators at selected extracellular and cytoplasmic sites. Biochemistry, 33, 298-306. Alexov, E.G. and Gunner, M.R. (1997). Incorporating protein conformational flexibility into the calculation of pH-dependent protein properties. Biophys. J. 72, 2075-2093. Allewell, N.M. and Oberoi, H. (1991). Electrostatic effects in protein folding and function. Meth.Enzymol. 202,3-19. Allison, S.A., Ganti, G., and McCammon, J.A. (1985). Simulation of the diffusion-controlled reaction between superoxide and superoxide dismutase. 1. Simple models. Biopolymers 24,1323-1336. Antosiewicz, J., McCammon, J. A., and Gilson, M.K. (1994). Prediction of pH-dependent properties of proteins. J. Mol. Biol. 238, 415-436.
Electrostatic Effects in Proteins
93
Antosiewicz, J., McCammon, J.A., Wlodek, ST., and Gilson, M.K. (1995). Simulation of charge-mutant acetylcholinesterases. Biochemistry, 34, 4211-5219. Antosiewicz, J., Wlodek, S.T., and McCammon, J. A. (1996). Acetylcholinesterase: Role of the enzyme's charge distribution in steering charged ligands toward the active site. Biopolymers 39, 85-94. Bakir, U., Coutinho, P.M., Sullivan, P.A., Ford, C, and Reilly, P.J. (1993). Cassette mutagenesis of Aspergillus awamori glucoamylase near its general acid residue to probe its catalytic and pH properties. Protein Engineering, 6, 939-946. Barker, P.D., Mauk, M.R., and Mauk, A.G. (1991). Proton titration curves of yeast iso-1-ferricytochrome c. Electrostatic and conformational effects of point mutations. Biochemistry, 30, 2377-2383. Bashford, D., Case, D.A., Dalvit, C, Tennant, L. and Wright, P.E. (1993). Electrostatic calculations of side-chain pK(a) values in myoglobin and comparison with NMR data for histidines. Biochemistry 32, 8045-8056. Bashford, D. and Gerwert, K. (1992). Electrostatic calculations of the pKa values of ionizable groups in bacteriorhodopsin. J. Mol. Biol. 224, 473-486. Bashford, D. and Karplus, M. (1990). pKa's of ionizable groups in proteins: Atomic detail from a continuum electrostatic model. Biochemistry 29, 10219-10225. Beroza, P., Fredkin, D.R., Okamura, M.Y., and Feher, G. (1991). Protonation of interacting residues in a protein by a Monte Carlo method: Application to lysozyme and photosynthetic reaction center of Rhodobacter sphaewides. Proc. Natl. Acad. Sci. U.S.A. 88, 5804-5808. Beveridge, D.L. and DiCapua, F.M. (1989). Free energy via molecular simulation: Applications to chemical and biochemical systems. Ann. Rev. Biophys. Biophys. Chem. 18, 431-492. Brocklehurst, K. (1994). A sound basis for pH-dependent kinetic studies on enzymes. Protein Engineering 7, 291-299. Cantor, C.R. and Schimmel, PR. (1980). Biophysical Chemistry., pp 847-929. W.H. Freeman and Co., San Francisco. Cleland, W.W. (1977). Determining the chemical mechanisms of enzyme-catalyzed reactions by kinetic studies. Adv. in Enzymol. Relat. Areas Mol. Biol. 45, 273-387. Coitino, E.L., Tomasi, J., and Cammi, R. (1995). On the evaluation of the solvent polarization apparent charges in the polarization continuum model: A new formulation. J. Comp. Chem. 16, 20-30. Cummins, PL. and Gready, J.E. (1993). Computer-aided drug design: A free energy perturbation study on the binding of methyl-substituted pterins and N5-deazapterins to dihydrofolate reductase. J. Comp. Aided Mol. Design 7, 535-555. Davis, M.E. and McCammon, J. A. (1990). Electrostatics in biomolecular structure and dynamics. Chem. Rev. 90, 509-521. Delepierre, M., Dobson, CM., Karplus, M., Poulsen, F.M., States, D.J. and Wedin, R.E. (1987). Electrostatic effects and hydrogen exchange behaviour in proteins. The pH-dependence of exchange rates in lysozyme. Appendix: States, D.J. and Karplus, M., A model for electrostatic effects in proteins. J. Mol. Biol. 197, 111-130. Demchuk, E., Mueller, T., Oschkinat, H., Sebald, W., and Wade, R.C. (1994). Receptor binding properties of four-helix-bundle growth factors deduced from electrostatic analysis. Protein Science 3, 920-935. Dill, K.A. and Stigter, D. (1995). Modeling protein stability as heteropolymer collapse. Adv. Prot. Chem. 46,59-104. Dimitrov, R.A. and Crichton, R.R. (1997). Self-consistent field approach to protein structure and stability. I: pH dependence of electrostatic contribution. Proteins. Struct. Funct. Genetics 27, 576-596. Ellis, K.J. and Morrison, J.F. (1982). Buffers of constant ionic strength for studying pH-dependent processes. Meth. Enzymol. 87, 405-426. Ewing, T.J. A. and Lybrand, T.P. (1994). A comparison of perturbation methods and Poisson-Boltzmann electrostatics calculations for estimation of relative solvation free energies. J. Phys. Chem. 98, 1748-1752.
94
ALLEWELL, OBEROI, HARIHARAN, and LICATA
Forman-Kay, J.D., Clore, G.M.,and Gronenborn, A.M. (1992). Relationship between electrostatics and redox function in human thioredoxin: Characterization of pH titration shifts using two-dimensional homo- and heteronuclear NMR. Biochemistry, 31, 3442-3452. Gane, P.J., Freedman, R.B., and Warwicker, J. (1995). A molecular model for the redox potential difference between thioredoxin and DsbA, based on electrostatic calculations. J. Mol. Biol. 249, 376-387. Garcia-Moreno, B. (1994). Estimating binding constants for site-specific interactions between monovalent ions and proteins. Meth. Enzymol. 240, 645-667. Garfin, D.E. (1990a). One-dimensional gel electrophoresis. Meth. Enzymol. 182, 425-441. Garfin, D.E. (1990b). Isoelectric focusing. Meth.Enzymol. 182, 459-477. Gao, J., Mammen, M., and Whitesides, G.M. (1996). Evaluating electrostatic contributions to binding with the use of protein charge ladders. Science 272, 535-537. Gilson, M.K. (1995). Molecular-dynamics simulation with a continuum electrostatic model of the solvent. J. Comp. Chem. 16, 1081-1095. Gilson, M.K. and Honig, B. (1987). Destabilization of an oc-helix-bindle protein by helix dipoles. Proc. Natl. Acad. Sci. U.S.A. 86, 1524-1528. Harvey, S.C. (1989). Treatment of electrostatic effects in macromolecular modeling. Proteins: Struct. Funct. Genet. 5, 78-92. Hoi, W.G.J., van Duijuen, P.T., and Berendsen, H.J.C. (1978). The a-helix dipole and the properties of proteins. Nature 273, 443-446. Hoist, M. and Saied, F. (1993). Multigrid solution of the Poisson-Boltzmann equation. J. Comp. Chem. 14, 105-113. Hoist. M., Kozack, R.E., Saied, F, and Subramaniam, S. (1994a). Protein electrostatics: Rapid multigrid-based Newton algorithm for solution of the full nonlinear Poisson-Boltzmann equation. J. Biomolec. Struct. Dynamics 11, 1437-1445. Hoist. M., Kozack, R.E., Saied, F, and Subramaniam, S. (1994b). Treatment of electrostatic effects in proteins: Multigrid-based Newton iterative method for solution of the full nonlinear PoissonBoltzmann equation. Proteins: Struct. Funct. Genet. 18, 231-245. Honig, B. and Nicholls, A. (1995). Classical electrostatics in biology and chemistry. Science 268, 1144-1149. Honig, B. and Yang, A.S. (1995). Free energy balance in protein folding. Adv. Proiein Chem. 46, 27-58. Janin, J. (1997). The kinetics of protein-protein recognition. Proteins, Struct. Funct. Genet. 28, 153-161. Jayaram, B., Sharp, K.A., and Honig. B.H. (1989). The electrostatic potential of B-DNA. Biopolymers 28, 975-993. Jorgensen, W.L. (1989). Free energy calculations: a breakthrough for modelling organic chemistry in solution. Accts. Chem. Res. 22, 184-189. Karshikov, A. (1995). A simple algorithm for the calculation of multiple site titration curves. Protein Eng. 8, 243-248. Karshikov, A., Duerring, M., and Huber, R. (1991). Role of electrostatic interaction in the stability of the hexamer of constitutive phycocyanin from Fremyella diplosiphon. Protein Eng. 4, 681-690. Kirkwood, J.G. (1939). Theory of solutions of molecules containing widely separated charges with special applications to zwitterions. J. Chem. Phys. 2, 351-361. Klapper, I., Hagstrom, R., Fine, R., Sharp, K.A., and Honig, B. (1986). Focusing of electric fields in the active site of Cu-Zn superoxide dismutase: Effects of ionic strength and amino-acid modification. Proteins: Struct. Funct. Genet. 1, 47-59. Kollman, P. (1993). Free energy calculations: Applications to chemical and biochemical phenomena. Chem. Rev. 93, 2395-2417. Lakey, J.H., Parker, M.W., Gonzales-Manas, J.M., Duche, D., Vriend, G., Baty, D., and Pattus, F. (1994). The role of electrostatic charge in the membrane insertion of colicin A. Calculation and mutation. Eur. J. Biochem. 220, 155-163.
Electrostatic Effects in Proteins
95
Langsetmo, K., Fuchs, J. A., and Woodward, C. (1991a). The conserved, buried aspartic acid in oxidized Escherichia coli thioredoxin has a pKa of 7.5. Its titration produces a related shift on global stability. Biochemistry 30, 7603-7609. Langsetmo, K., Fuchs, J.A., Woodward, C, and Sharp, K.A. (1991b). Linkage of thioredoxin stability to titration of ionizable groups with perturbed pKa. Biochemistry 30, 7609-7614. Leach, A.R. (1994). Ligand docking to proteins with discrete side-chain flexibility. J. Mol. Biol. 235, 345-356. Lee, B. and Richards, F.M. (1971). The interpretation of protein structures: Estimation of static accessibility. J. Mol. Biol. 55, 379-400. Leger, D. and Hervd, G. (1988). Allostery and pKa changes in aspartate transcarbamoylase from Escherichia coli. Analysis of the pH dependence in the isolated catalytic subunits. Biochemistry 27, 4293-4298. Linderstrom-Lang, K. (1924). On the ionization of proteins. Comptes Rendus des Travaux du Laboratoire Carlsberg 15 (7), 1-29. Loewenthal, R., Sancho, J., Reinikainen, T, and Fersht, A.R. (1993). Long-range surface charge-charge interactions in proteins. J. Mol. Biol. 232, 574-583. Matthew, J.B., Gurd, F.R.N., Garcia-Moreno, B., Flanagan, M.A., March, K.L., and Shire, S.J. (1985). pH-dependent processes in proteins. CRC Crit. Rev. Biochem. 18, 91-197. MacKerell, A.D., Jr., Sommer, M.S., and Karplus, M. (1995). pH dependence of binding reactions from free energy simulations and macroscopic continuum electrostatic calculations: Application to 2'GMP/3'GMP binding to ribonuclease Tt and implications for catalysis. J. Mol. Biol. 247, 774-807. McCarrick, M.A. and Kollman, R (1994). Use of molecular dynamics and free energy perturbation calculations in anti-human immunodeficiency virus drug design. Meth. Enzymol. 241, 370-384. McQuarrie, D.A. (1976). Statistical Mechanics. Harper and Row, New York. Meeker, A.K., Garcia-Moreno, B., and Shortle, D. (1996). Contributions of the ionizable amino acids to the stability of Staphylococcal nuclease. Biochemistry 35, 6443-6449. Misra, V.K., Sharp, K.A., Friedman, R.A., and Honig, B. (1994). Salt effects on ligand-DNA binding. Minor groove binding antibiotics. J. Mol. Biol. 238, 245-263. Monette, M. and Lafleur, M. (1995). Modulation of melittin-induced lysis by surface charge density of membranes. Biophys. J. 68, 187-195. Nakamura, H. (1996). Roles of electrostatic interactions in proteins. Quart. Rev. Biophys. 29, 1-90. Nicholls, A., Sharp, K.A., and Honig, B. (1991). Protein folding and association: Insights from the interfacial and thermodynamic properties of hydrocarbons. Proteins: Struct. Funct. Genet. 11, 281-296. Oberoi, H. and Allewell, N.M. (1993). Multigrid solution of the nonlinear Poisson-Boltzmann equation and calculation of titration curves. Biophys. J. 65, 48-55. Oberoi, H., Trikha, J., Yuan, C., and Allewell, N.M. (1996). Identification and analysis of long-range electrostatic effects in proteins by computer modeling: Aspartate transcarbamylase. Proteins: Struct. Funct. Genet. 25, 300-314. Oliveberg, M. and Fersht, A.R. (1996). Formation of electrostatic interactions on the protein-folding pathway. Biochemistry 38, 2726-2737. Onsager, L. (1936). Electric moments of molecules in liquids. J. Am. Chem. Soc. 58, 1486-1493. Perakyla, M. and Pakkanen, T.A. (1995). Model assembly study of the ligand binding by p-hydroxybenzoate hydroxylase: Correlation between the calculated binding energies and the experimental dissociation constants. Proteins: Struct. Funct. Genet. 21, 22-29. Reddy, M.R., Viswanadhan, V.N., and Weinstein, J.N. (1991). Relative differences in the binding free energies of human immunodeficiency virus 1 protease inhibitors: A thermodynamic cycle-perturbation approach. Proc. Natl. Acad. Sci. U.S.A. 88, 10287-10291. Rogers, N.K. (1986). The modeling of electrostatic interactions in the function of globular proteins. Prog. Biophys. Mol. Biol. 48, 37-66.
96
ALLEWELL, OBEROI, HARIHARAN, and LICATA
Rossomando, E.R (1990). Ion exchange chromatography. Methods Enzymol. 182, 309-317. Schreiber, G. and Fersht, A.R. (1996). Rapid, electrostatically assisted association of proteins. Nature Struct. Biol. 3,427-431. Scott, D.L., Mandel, A.M., Sigler, P.B., and Honig, B. (1994). The electrostatic basis for the interfacial binding of secretory phospholipases A2. Biophys. J. 67, 493-504. Shafferman, A., Ordentlich, A., Barak, D., Kronman, C, Ber, R., Bino, T., Ariel, N., Osman, R., and Velan, B. (1994). Electrostatic attraction by surface charge does not contribute to the catalytic efficiency of acetylcholinesterase. EMBO J. 13, 3448-3455. Sharp, K.A. and Honig, B. (1990). Electrostatic interactions in macromolecules: Theory and applications. Ann. Rev. Biophys. Biophys. Chem. 19, 301-332. Sharp, K.A., Friedman, R.A., Misra, V, Hecht, J., and Honig, B. (1995). Salt effects on polyelectrolyteligand binding: Comparison of Poisson-Boltzmann and limiting law/counterion binding models. Biopolymers 36, 245-262. Stigter, D., Alonso, D.O.V., and Dill, K.A. (1991). Protein stability: Electrostatics and compact denatured states. Proc. Natl. Acad. Sci. U.S.A. 88, 4176-4180. Stoddard, B.L. and Koshland, D.E., Jr. (1993). Molecular recognition analyzed by docking simulations: The aspartate receptor and isocitrate dehydrogenase from Escherichia coli. Proc. Natl. Acad. Sci. U.S.A. 90, 1146-1153. Tan, Y.J., Oliveberg, M., Davis, B., and Fersht, A.R. (1995). Perturbed pKa values in the denatured states of proteins. J. Mol. Biol. 254, 980-992. Tanford, C. and Kirkwood, J.G. (1957). Theory of protein titration curves. I. General equations for impenetrable spheres. J. Am. Chem. Soc. 79, 5333-5339. Tanford, C. and Roxby, R. (1972). Interpretation of protein titration curves. Application to lysozyme. Biochemistry 11,2192-2198. Taylor, M.A.J., Baker, K.C., Connerton, I.F., Cummings, N.J., Harris, G.W., Henderson, I.M.J., Jones, S.T.,Pickersgill,R.W., Sumner, I.G.,Warwicker, J., and Goodenough, P.W. (1994). An unequivocal example of cysteine proteinase activity affected by multiple electrostatic interactions. Protein Eng. 7, 1267-1276. Tidor, B. (1993). Simulated annealing on free energy surfaces by a combined molecular dynamics and Monte Carlo approach. J. Phys. Chem. 97, 1069-1073. Thomas, F.G., Russell, A.J., and Fersht, A.R. (1985). Tailoring the pH dependence of enzyme catalysis by protein engineering. Nature 318, 375-376. Tomasi, J. and Persico, M. (1994). Molecular interactions in solution; an overview of methods based on continuous distribution of the solvent. Chem. Rev. 94, 2027-2094. Turnbull, J.L., Waldrop, G.L., and Schachman, H.K. (1992). Ionization of amino acid residues involved in the catalytic mechanism of aspartate transcarbamylase. Biochemistry 31, 6562-6569. Warshel, A. and Russell, S.T. (1984). Calculation of electrostatic interactions in biological systems and in solution. Q. Rev. Biophys. 17, 283-422. Warwicker, J. (1994). Improved continuum electrostatic modelling in proteins, with comparison to experiment. J. Mol. Biol. 236, 887-903. Warwicker, J. and Watson, H.C. (1982). Calculation of electrostatic potential in the active site cleft due to a-helix dipoles. J. Mol. Biol. 155, 53-62. Warwicker, J., Mueller-Harvey, I., Sumner, I., and Bhat, K.M. (1994). The activity of porcine pancreatic phospholipase A2 in 20% alcohol/aqueous solvent, by experiment and electrostatics calculations. J. Mol. Biol. 236, 904-917. Weber, G. (1975). Energetics of ligand binding to proteins. Adv. Protein Chem. 29, 1-83. Wyman, J., Jr. (1964). Linked functions and reciprocal effects in hemoglobin: A second look. Adv. Protein Chem. 19, 223-286. Wyman, J. and Gill, S.J. (1990). Binding and Linkage: Functional Chemistry of Biological Macromolecules. University Science Books, Mill Valley, CA. Yang, A.-S. and Honig, B. (1993). On the pH dependence of protein stability. J. Mol. Biol. 231,459-474.
Electrostatic Effects in Proteins
97
You, T.J. and Harvey, S.C. (1993). A finite element approach to the electrostatics of macromolecules with arbitrary geometries. J. Comp. Chem. 14, 484-501. Yuan, C, LiCata, V, and Allewell, N. (1996). Effects of assembly and mutations outside the active site on the functional pH dependence of E. coli aspartate transcarbamylase. J. Biol. Chem. 271, 1285-1294. Zhou, H.X. (1993). Boundary element solution of macromolecular electrostatics: interaction energy between two proteins. Biophys. J. 65, 955-963. Zhou, X.H. and Vijayakumar, M. (1997). Modeling of protein conformational fluctuations in pKa predictions. J. Mol. Biol. 267, 1002-1011.
This Page Intentionally Left Blank
Chapter 4
The Binding of Ions to Proteins JENNY P. GLUSKER
Abstract Introduction Metal Ion Binding to Protein Functional Groups Examples of Cation Binding in Proteins Ion Migration in Proteins Containing More Than One Metal Anion Binding to Protein Functional Groups Methods of Prediction of Ion-Binding Sites Acknowledgments
99 100 105 118 126 129 141 147
ABSTRACT The sites on proteins that ions select for binding depend on the charge, cavity size, and chemistry of the space available. Positively charged ions such as metal ions bind to the carboxylate, imidazole, and sulfhydryl groups on the side chains of proteins. The optimal location of metal ions with respect to these functional groups can be found from crystal structures of proteins and of small molecules. Metal ions can be distinguished in terms of their polarizabilities: the less polarizable cations such as Mg 2+ bind to oxygen ligands, whereas the more polarizable cations such as Cu+ prefer sulfur as a ligand. Most transition metal ions have properties intermediate between these two. Examples of several studies of metal binding in X-ray crystal structure
Protein: A Comprehensive Treatise Volume 2, pages 99-152 Copyright © 1999 by JAI Press Inc. All rights of reproduction in any form reserved. ISBN: 1-55938-672-X 99
100
JENNY P. GLUSKER
determinations are presented together with some information on methods currently in use to identify sites of ion binding when the three-dimensional structure of the protein is known.
INTRODUCTION The binding of ions to proteins is generally an electrostatic effect. The geometries of the interactions involved are governed by the need for local balance of charge in any area of a protein, as Linus Pauling (1929) noted for the packing of small molecules or ions in crystals. The binding of ionic ligands to proteins conforms to conditions such as that the available binding cavity in the protein must be of the appropriate size to enclose the ion and that the charge on the inner surface of this cavity in the protein must be approximately balanced by the charge of the ion that is to be bound. It will be necessary for the ion to displace other binding groups; some of these such as water molecules are readily displaced, while others are not so that a local conformational change in the protein (to reorient amino-acid side chains) may be necessary. The main thrust of this article is to consider those geometric and electronic factors that result in the binding of a specific ion (and sometimes of related ions) to sites in proteins to the exclusion of other ions. Results of three-dimensional structure determinations of proteins and protein-ligand structures by X-ray diffraction methods form the basis of the descriptions in this article. Cations and anions will be considered separately. Studies of ion binding are now so extensive that only a few selected examples illustrating binding types can be given here. Often ions will bind in sites on an enzyme that have been specifically engineered by nature to attract these ions that are required for the successful catalysis of a biochemical reaction. For example, metal ions are often part of the active site of an enzyme while many enzyme substrates are anions so that investigations of the binding of ions to an enzyme are often relevant to an understanding of the mechanism of action of that enzyme. These ions are replaceable by other ions of similar size and charge, but the result of such changes many be an inactive enzyme. There are three important types of interactions between proteins and ligands: 1. electrostatic interactions that have little, if any, orientational preferences, 2. hydrogen bonding, which is generally highly directional (Umeyama and Morokuma, 1977; Taylor and Kennard, 1984; Jeffrey, 1987), and 3. weaker interactions such as C-H-O interactions, which are mainly found in the more hydrophobic areas of proteins and can serve to help align ligand molecules containing functional groups (Sutor, 1963; Gould et al., 1985; Burley and Petsko, 1988).
Ion Binding to Proteins
101
Of these three types of interactions, the electrostatic and hydrogen-bonding components will mainly be considered here. Several of the amino-acid side chains in proteins are ionized at neutral pH and therefore, under physiological conditions, attract ions of the opposite charge. Some of these possibilities are diagrammed in Figure 1 and listed in Table 1. The carboxyl groups on aspartic and glutamic acid residues are the main attractants of positively
I I
^V*^ Asp.Glu Cs*
\ i
s
\
H
CH
// \\
* |
^
-CH
2
CH;
His
Lys
CH 2
I'
Arg
CH3
1
"CH2
Ser, Thr
H
\
^ V Asn, Gin
Tyr
C|
H
^
^
Figure 7. Amino-acid side chains that bind ions. Directions of binding (hydrogen bonding, and, in some cases, metal-ion binding) are indicated by open arrows.
JENNY P. GLUSKER
102
Table 1. Amino-Acid Side Chains That Bind Ions (a)
(b)
(c)
Negatively-charged amino acids
(d)
Hydrogen bond donors
Aspartate (Asp)
Main-chain a m i n o (-NH-l
Glutamate (Glu)
Asparagine (Asn)
Positively-charged amino acids
Glutamine (Gin)
Histidine (His)
Arginine (Arg)
Lysine (Lys)
Lysine (Lys)
Arginine (Arg)
Tyrosine (Tyr)
Hydrogen bond acceptors
Serine (Ser)
Main-chain carbonyl (C=0)
Threonine (Thr)
Asparagine (Asn)
Cysteine (Cys)
Glutamine (Gin)
Aspartic acid (Asp)
Aspartate (Asp)
Glutamic acid (Glu)
Glutamate (Glu) Histidine (His)
charged cations such as metal ions, but there are other functional groups that also bind metal or hydrogen ions. Examples are provided by histidine, methionine, asparagine, cysteine, and glutamine side chains, main-chain carbonyl (C=0) groups, and the hydroxyl groups on tyrosine, serine, and threonine. Negatively charged groups are more often held in place by means of hydrogen bonds to positively charged side chains such as those of arginine, histidine, lysine, and main-chain amino (-NH-) groups. Arginine side chains play a particularly important role in aligning anions such as carboxylate and phosphate groups and holding them in a rigid manner (Borders et al., 1994; Shimoni and Glusker, 1995). Positively charged ions that can bind to proteins include hydrogen ions, metal ions, and other cations such as ammonium and substituted ammonium ions. Of these, hydrogen ions are difficult to locate; the X-ray diffraction method does not generally reveal them in the electron-density maps of proteins because the resolution is not high enough. They can be found by neutron diffraction studies and by NMR measurements. They are also identified in hydrogen bonding motifs, where presumably the hydrogen ion can be transferred back and forth along certain hydrogen bonds as has been suggested for a-chymotrypsin (Blevins and Tulinsky, 1985; Tsukada and Blow, 1985). Most of the information on ion binding has come from X-ray diffraction and NMR studies. This chapter will concentrate on results from X-ray diffraction studies. In the early stages of protein structure determination by X-ray diffraction, it is usual to soak compounds containing solutions of heavy atom-containing compounds into protein crystals. These heavy-atom compounds enter the crystal via the water channels composing, in the average case, about half of the volume of the crystal. The heavy-atom containing compounds can then attach to side chains and some main-chain atoms in the protein. X-ray diffraction data from two or more
Ion Binding to Proteins
103
heavy-atom derivatives of the protein are used to determine relative phases of each X-ray diffraction beam so that an electron-density map can be calculated. For example, in the crystal structure determination of D-xylose isomerase, when heavyatom derivatives were made, it was found that the uranyl-containing groups were located in the metal-binding site of the enzyme while platinum- and mercurycontaining groups were located on the exterior of the enzyme (H.L. Carrell et al., 1984). In high-resolution electron-density maps of proteins (about 1.7 A resolution or higher), it is often possible to distinguish high peaks with other peaks around them at about 2.0-2.4 A. Arrangements of atoms with these separations imply that a metal ion has been bound in the protein. By contrast, covalent bond distances are of the order of 1.2-1.5 A, hydrogen bond distances are 2.7-3.1 A long, and van der Waals interactions are generally 3.4-3.7 A. Therefore these different types of bonding can usually be distinguished by the interatomic distances found, although distances from ligand atoms to very large cations such as potassium ions approximate the distances found in hydrogen bonds. When perusing reports of crystal structure determinations, one must be careful to note the resolution of the structure determination (Glusker and Trueblood, 1985). If this is in the region of 1.6-1.9 A, the resolution is good for a protein structure determination, and most of the side chains as well as the main chain will be located,
(a)
(b)
(c)
Figure 2. Resolution of a three-ring structure at (a) 2.5 A resolution, (b) 1.5 A resolution, and (c) 0.8 A resolution. Protein structures are generally determined at the resolution shown in (a) and (b). Small molecules are done at the resolution shown in (c) or better.
104
JENNY P. GLUSKER
together with many water molecules bound to the protein. Most of the protein structures discussed here are at a fairly high resolution. The resolution of a ring structure is diagrammed in Figure 2. When protein side chain groups are fit to an electron-density map, the model of an entire side chain is fit as best it can, possibly with some rotation of bonds at the end of a long side chain. Then the model is refined. Unless the resolution of the structure is very high, there are generally not enough data for a refinement of each atomic position and temperature factor. Therefore, the refinement is applied to the side chain as an entity rather than just an atom. As a result, minor structural variations cannot be identified at the lower resolution of protein structure determination. This must always be kept in mind when interpreting results of protein structure determination (Bernstein et al., 1977). The identities of atoms in macromolecular structure determinations are sometimes in question. The X-ray scattering of an atom is proportional to its atomic number; this is not true for neutron scattering which is the reason that hydrogen atoms (with the lowest atomic number of any atoms) can be located more readily by neutron diffraction. Experimental problems and the greatly increased number of atoms that must be located in a neutron-diffraction experiment make this method of protein structure determination more difficult than X-ray diffraction studies and therefore rarely used. The electron density map obtained from an X-ray diffraction study gives a measure of the electron count per A3 at each point on a chosen grid of selected spacings in three dimensions. Therefore if electron-density values are summed for each grid point (n in all) that covers the area of an ion, an electron count can be made by dividing the sum of n grid points by the volume (in A3) that they cover. The identity of a metal ion in a macromolecular crystal structure determination can be obtained in this way from a high-resolution electron-density map. The height of a peak in an electron density map is related to the atomic number of the atom it represents, but this height is also affected by the temperature factor (also more correctly called the displacement parameter) of the atom, as diagrammed in Figure 3 (Glusker and Trueblood, 1985). The atomic coordinates define the position of each atom in three dimensions in the repeat unit (the unit cell). The displacement parameter factor defines the extent to which atomic positions vary from unit cell to unit cell throughout the millions of such unit cells in a crystal. The frequency of X-rays is much higher than that of atomic and molecular vibrations so that the X-ray diffraction experiment encounters an instantaneous snapshot of the displaced atoms in a molecule. A measure of the displacement parameter of an atom can serve to amend the crystal structure model for an incorrect atomic number for that atom. Thus, if the temperature factor of one atom is found, on protein structure refinement, to be very low compared with those of the surrounding atoms, the atomic number of that atom in the model that is being refined is probably too low and should be increased. If the temperature factor is very high, the atomic number of that atom in the model is probably too high, and in the refinement, an attempt is being made to reduce the contribution of this atom to each structure factor.
A
/\
Ion Binding to Proteins
(a)
105
(b)
Figure 3. Atomic displacement parameters (temperature factors). Profiles of atoms in an electron-density map. The vertical axis represents electron density in electrons per A3, (a) An atom with a small displacement parameter, (b) An atom with a larger displacement parameter. In both (a) and (b) the three-dimensional volume under the plots are similar, but the electron density is more spread out in (b), and the peak height is lower.
The shapes of scattering factor curves for different types of atoms vary slightly; because metal ions are positively charged, they have a sharper contour in an electron density map than do single atom anions. As a result, the peak height is greater for a cation than for an anion with the same numerical charge of opposite sign. This means that magnesium ions (atomic number 12, charge +2, effectively 10 electrons) give higher peaks (with less width) than do fluoride ions (atomic number 9, charge - 1 , effectively also 10 electrons) provided the map resolution is high enough.
METAL ION BINDING TO PROTEIN FUNCTIONAL GROUPS The best protein binding sites for ions are those that have been engineered by nature to attract biochemically relevant ions. Proteins, including enzymes, often have specific sites on them for metal ion binding and they attach these cations in a variety of ways and for a variety of reasons. Metal ions are better for catalysis than are hydrogen ions because they generally have a higher charge and can be present in reasonable concentrations at neutral pH. If a metal ion takes part in the catalytic mechanism of an enzyme, this cation may serve to bring specific functional groups together in the relative orientation that is most appropriate for reaction, it may take a part in oxidation-reduction reactions, or it may provide electrostatic shielding from negative charges so that a negatively charged substrate can approach the active site. Alternatively the metal ion may help stabilize the active site so that the catalyzed reaction can be highly stereospecific. Nature has chosen several interesting mechanisms for selecting a specific cation to bind at a given site on a protein rather than just fitting any cation that has the best charge and size to fit. The chemistry of the metal ion, as well as its size and charge, is taken into account. Other ions may also bind if they mimic these ions in shape and size, but their chemistry may differ and the result may be an inactive enzyme. For example, sulfate ions bind well at sites meant for phosphate groups (which have a similar size); metal
106
JENNY P. GLUSKER Table 2.
Relative Concentrations of Cations (mM)
Medium
Sodium
Fluids in cells Fluids outside cells Sea water
11 160 450
Magnesium
Calcium
Potassium
10"4 2 10
2.5 2 52
92 10 10
ions bind in areas that expect other metal ions of similar size and charge. How well these foreign ions bind depends on how well the lining of the cavity suits their individual chemistries. In the body, the metal cations available in high concentrations for binding to proteins are few. Ion concentrations in the body are high only for Na + , K*, Mg2+, and Ca2+, as shown in Table 2. These concentrations (except in the case of potassium ions) generally lie between those for sea water and pure water. Magnesium ions have about the same concentration within the cell as in its surrounding extracellular fluids. Potassium ions are present in higher concentrations in the cell, while sodium ions are essentially excluded from it by a membrane-bound ionic pump specifically designed for the purpose. Calcium ions, because they form insoluble salts rather readily and therefore might cause problems within the cell, are found mainly in the extracellular fluids and in bone. These four metal ions do not have unshared valence electrons; they bind by purely electrostatic interactions. In addition, because they are not readily deformed (polarized) by an electric field from a neighboring atom, they are called "hard" and tend to bind to hard ligands, particularly oxygen (Table 3) (Ahrland et al., 1958; Pearson, 1963). Very few enzymes utilize sodium or potassium in their catalytic mechanisms because of concentration problems, that is, the potassium ion concentration is very
Table 3.
Hard and Soft Metal Ions
(a) Characteristics of the metal ions Hard +
+
+
Borderline 2+
2+
2+
H , Li , Na , K+, Be2+, Mg 2+ , Ca2+, Sr2+, Mn 2+ , Al 3+ , Cr3+, Co3*, Fe3+ (b) Stabilities of complexes With Hard Cations
Fe , Co , Ni Cu 2+ , Zn 2+ , Pb2+
F > CI > Br > I O » S > Se > Te N » P > As > Sb
F < CI < Br < I O « S ~ Se ~ Te N « P > As > Sb
With Soft Cations
Soft +
+
Cu , Ag , Au + , Tl + , Cd 2+ , Hg 2+ , Pd2+, Pt2+
Ion Binding to Proteins
107
high in the cell and the sodium ion concentration is low, controlled by the ion pump. Additional control of the concentrations of these ions by enzyme-mediated agents would be difficult (Glusker, 1991,1994). Therefore many proteins in the cell utilize the remaining ions present in high concentrations—calcium and magnesium ions—in a variety of ways. Certain proteins have engineered cavities within them that can bind these metal ions specifically. There are, however, many different types of biochemical reactions that need to be catalyzed for the maintenance of life. These reactions will proceed more readily with "softer" metal ions, that is, those like the transition metal ions that are more readily deformable than the alkali metal and alkaline earth cations. These softer metal ions are only present in trace amounts in the cell, but can be selected out by an engineering of the appropriate binding site within the protein, as will be described. There are two types of enzymes that bind metal ions—the metalloenzymes, which tightly bind transition metal ions, and the metal-activated enzymes, which loosely bind alkali metal and alkaline earth metal ions. Of the less common elements used by enzymes, probably the most important are divalent zinc, copper, and iron, which are bound to many enzyme systems and take part in their catalytic mechanisms. We wish to know which ions will bind to a given protein, how they are bound, and, when they bind, what they do within the protein under physiological conditions. Metal ion-binding sites on proteins are selective if they provide a cavity with a required diameter that will just accommodate the required metal ion and will also contain enough negative charge to neutralize the charge on the metal ion. They also should provide binding groups with the appropriate deformability (hard or soft). We will first consider the relationships of metal ions to their binding groups; carboxylates, imidazoles, and sulfhydryl groups are the most common metalbinding groups in proteins. Of these, the oxygen atoms of carboxylate groups can be considered hard, the sulfur atoms of sulfhydryl groups as soft, and the nitrogen atoms of histidine groups as somewhat softer than oxygen atoms (see Table 3). The stabilities of complexes of the borderline ions with a given ligand are expressed in the Irving-Williams series (Irving and Williams, 1953). In this series, shown below, the ionic radius decreases from left to right, while the ionization potential increases. Ba2+ < Sr2+ < Ca2+ < Mg2+ < Mn2+ < Fe 2+ < Co2+ < Ni2+ < Cu2+ > Zn2+ The main protein side-chain groups that bind metal ions are the carboxyl groups of aspartic and glutamic acids. The relative positions of ions with respect to carboxyl groups in metal ion-carboxylate interaction have been investigated in our laboratory (C. J. Carrell et al., 1988). There are two lone-pair electrons on an oxygen atom of an ionized carboxylate ion. We asked which lone pair is preferred for metal cation binding—the one that is syn or the one that is anti to the other C-O bond (Gandour, 1981) (see Figure 4). The C-COO carboxylate group is planar. Where do the metal ions bind with respect to this plane? In order to investigate these geometrical queries, we examined the structures of small-molecule crystal structures in the Cambridge Structural Database (Allen et al., 1979). These crystal structures are
108
JENNY P. GLUSKER
O
M"<
SMI
(a)
\ (b)
~CX
direcc
(c)
Figure 4. Lone-pair electrons on a carboxylate group, (a) syn, (b) anti, and (c) the situation where both oxygen atoms are equally shared (bidentate).
determined to a much higher resolution than those of proteins and give important information on general binding modes when examined in a statistical fashion. Therefore all structures in which an isolated carboxylate group bound a metal ion were examined. The scatterplot of metal ion positions around a carboxylate group was contoured to give an overall probability density of points (Rosenfield et al., 1977, 1984; Murray-Rust and Glusker, 1984). This makes visualization easier, as shown in Figure 5. In these statistical analyses, we also investigated whether metal ions lie in the plane of the carboxyl group or not and those factors that determine which metal ions share both oxygen atoms of the carboxyl group equally (Einspahr and Bugg, 1981; C. J. Carrell et al., 1988). Examples are shown for sodium, magnesium, potassium, calcium, and for divalent manganese, iron, copper, and zinc in Figure 5. These are presented as contoured scatterplots representing the probability that the particular metal ion will bind when viewed onto the plane of the carboxylate group and also along this plane. The latter diagram shows the extent to which the metal ion deviates from the plane of the carboxylate group. As had been assumed, in general, metal ions prefer the syn lone-pair electrons for binding. In addition, we found that the metal ion generally lies in the plane of the carboxyl group, the main exception being the alkali metal ions sodium and potassium as shown in Table 4. These metals ionize readily and form strong bases, so it is not surprising that they have less specific modes of binding. They bind in all possible orientations with respect to the carboxylate group. This study also gave an indication of whether a metal ion binds one or both oxygen atoms of a carboxylate group. When the distance between the metal cation and a carboxylate oxygen atom is expected to be in the range 2.3 to 2.6 A, the metal ion will often share both oxygen atoms of the
109
Ion Binding to Proteins
Mn
Fe
Y
r
cv
Zn
'Y'
Y
Figure 5. Contoured scatterplots of the metal-binding sites around carboxylate ions. Shown are results for Na + , Mg + , K+, Ca 2+ Mn 2+ Fe 2+ , C u 2 + , and Z n 2 + . The carboxylate group is indicated by a line diagram. The upper diagram is viewed perpendicular to the plane of the carboxylate group while the lower diagram is viewed along this plane. Note the syn and anti binding for Mg + and M n + and the bidentate binding for Ca + .
carboxylate group equally ("direct" or bidentate binding). This seems to be a function of the nonbonded O—O distance of 2.2 A in carboxylate groups and of the need to keep the O—Mn+—O angles larger than 60°. As shown in Figure 5, calcium ions form this type of interaction readily, while magnesium ions, which are smaller, do not. These findings apply also to metal ion binding in protein structures
Table 4.
Deviations of Selected Ions from the Carboxylate Plane
Type of Cation (a) Approximately in plane
(b) Up to 0. 5 A out of plane (c) Up to 1.5 A out of plane (d) Over 1. 5 A out of plane
General Li, Be, Mg, In, Sb, Tl, Pb(IV)
Transition Metal Sc, Ti, V, Cr, Mn, Fe, Co, N\ Cu, Zn, Mo, Ru, Rh, Pd, Re, Os, Ir, Pt, Au Nb, Mo(V)
Ca, Sn(IV)
Cd
Na, K, Rb, Sr, Cs, Ba, Pb(ll)
Ag(l), Hg(ll)
Lanthanide/actinide Pr, Gd, Er, Yb, Np(VI)
La, Nd, Sm, Dy, Tm, Np(V) Ce, U(VI), Am
110
JENNY P. GLUSKER
\ Ne2
N51
(a)
V
(b)
Figure 6. Binding to histidine side chains, (a) Binding of a metal ion, M 2 + , and (b) binding by hydrogen bonding in oc-chymotrypsin.
(Chakrabarti, 1990a; H. L. Carrell et al., 1989; Pascard, 1995). A list of cationoxygen distances is given in Table 5. In a similar way, when metal ions bind to the imidazole groups in histidine side chains, the metal ion generally lies in the plane of the ring system. Since nitrogen is somewhat softer than oxygen, the slightly more deformable transition elements, particularly divalent zinc and copper ions, are generally found to bind (Chakrabarti, 1990b). An analysis of binding of metal ions (A. B. Carrell et al., 1993) indicates that out-of-plane deviations have average values of 3-4° (6° in proteins) (Chakrabarti, 1990b). This metal-ion binding remains more rigorously in the plane of the histidine group than does a hydrogen-bonded group. Ab initio molecular orbital calculations (A.B. Carrell et al., 1993) also indicate that the energy cost of deviating from the plane is greater for divalent zinc (a transition element) than for divalent magnesium ions (a hard alkaline earth cation). In general, but not always, the metal ion is bound to Ne2 of histidine while N8l has a proton on it as shown in Figure 6a (Chakrabarti, 1990b; A. B. Carrell et al., 1993). Thus metal ions lie within 5-10° of the imidazole plane along the lone-pair electron of the nitrogen atom. When hydrogen bonds to a histidine ring, carboxyl and hydroxyl groups have their oxygen atoms near the histidine ring plane, but the rest of the carboxyl or hydroxyl group
Ion Binding to Proteins
111
Table 5. Average Metal Ion-Oxygen Distances3 Distance Range 1.8-1.89 A 1.9-1.99 A 2.0-2.09 A 2.1-2.19 2.2-2.29 2.3-2.39 2.4-2.49 2.5-2.59 2.6-2.69 2.7-2.79 2.8-2.89 2.9-2.99
A A A A A A A A A
Cations Mn(lll),Co(lll) Li(l), Cr(lll), Cud), Ptd), V(IV) VOID, Fedll), Rhdll), Cu(ll)b, Mgdl), Re(lll) Rudll), ZndD, Fe(ll)b, Co(\\), Nidi), Tc(V), Mo(IV) W(II,IV), Mo(V)/ Mndl), Nb(V), Sn(ll) SndV)b, Dydll), Agd)b, Cd(ll), HgdDb, Tl(lll) Cadl)b, Sndll)b, Nad)b, NcKlll), U(VI) CedV), La(lll) Sr(ll)b Ba(ll) K(l)b Rb(l)
Notes: aDistances vary with the coordination number; they are shorter if the coordination number is lower. Cation-nitrogen distances are approximately the same, but cation-sulfur distances are somewhat larger. b Minimum value 0. 2 A (or more) less than this value, implying a variable cation-oxygen distance.
lies out of the plane as found in chymotrypsin (Figure 6b) (Tsukada and Blow, 1985; Blevins and Tulinsky, 1985). The binding of metal ions to the sulfhydryl groups of cysteine residues has been studied in a similar manner (Chakrabarti, 1989). Common metals to bind to this group are Fe(II), Zn(II), Cu(I), Ag(I), and Hg(II). The latter two are often used to prepare heavy-atom derivatives of proteins for phase determination. These binding modes are pH-dependent (Vallee and Auld, 1990; Vedani and Huhta, 1990) in that, at high pH, the amino group has no charge and exists as NH2, which binds metals, but at lower pH it is in the form of NH3 and is less likely to bind metal ions. For example, at high pH, Zn2+ binds to the nitrogen and sulfur atoms of cysteine, but at lower pH it binds to the oxygen and sulfur atoms. Citrate provides a multidentate ligand that is the substrate of several enzymes such as aconitase (Lauble et al., 1992) that can bind many types of metal ions and organic cations. It contains three carboxylate groups, one with an a-hydroxycarboxylate group. The effect of neighboring groups on carboxylate binding has been analyzed by studies of a-hydroxy- and a-fluorocarboxylates (H. L. Carrell et al., 1987). When a metal cation binds across this chelating group, it may lie in plane or out of plane, depending on the size of the cation and the existence of neighboring groups. If no metal group binds across the group, then a hydrogen atom of the hydroxyl group will bind there. This is not an option for the ct-fluoro derivative (which does not have a hydrogen atom available) and a metal will therefore necessarily bind if at all possible (H.L. Carrell et al., 1987). This is diagrammed in
JENNY P. GLUSKER
112
Figure 7. Another binding motif involves a metal ion coordinated to a carboxylate ion both of which share a water molecule. This arrangement is common in crystal structures of small molecules and proteins (Kaufman et al., 1993; H. L. Carrell et al., 1994). The next question is, which metal ions are most likely to bind to a carboxyl, imidazole, hydroxyl, or sulfhydryl group? As already mentioned, oxygen atoms are hard, nitrogen atoms somewhat softer, and sulfur atoms are soft (deformable). Each will preferentially bind a metal ion with the same characteristics (hard or soft). To quote Mild van (1970): "Cations that indulge in ionic bonding prefer ligands that so indulge; cations that indulge in covalent bonding prefer ligands that so indulge." This is a rephrasing of the maxim that hard acids prefer to coordinate with hard bases and soft acids prefer to coordinate with soft bases. Therefore it is not surprising that the hard cation magnesium, which prefers to form ionic bonds, exhibits a high affinity for oxygen atoms, and binds them well. In this, magnesium ions are very different from zinc ions, which, even though they are approximately the same size and charge as magnesium ions, readily bind nitrogen and sulfur atoms in addition to oxygen atoms. While magnesium binds six oxygen atoms with an octahedral coordination that is rarely perturbed, zinc has a flexible coordination sphere and can show coordination numbers of 4, 5, or 6. This may in part explain why it is involved in many catalytic mechanisms (Bock et al., 1995). Magnesium structures with coordination numbers other than 6 tend to contain rigid polycyclic compounds with specific planar coordination properties. Those with a coordination number of 5 generally involve porphyrins or other similar polycyclic ligands. In each case, one axial position is filled with solvent, often water. Those with a coordination number of 7 are mostly crown ethers or analogous ligands that, in this case, have water in two axial locations. Several of the four-coordinate structures have an ether such as diethyl ether, dioxane, or tetrahydrofuran as an adduct, but H
\ Ri—^C
!
\
\ \r=rro
R
//
X
~*M*
U
C
\ V ^
If
I!'
0
(b)
R. i-
•' \ ^ 6
0
(a)
"M* *
R,—-^c
0
(c)
Figure 7. Metal ion binding to an cc-substituted carboxyl group, (a) Internal hydrogen bonding in an oc-hydroxycarboxylate. (b) Metal ion (M+) binding to the same carboxylate derivative, (c) The fluoro analogue, which has no hydrogen atom available for hydrogen bonding and binds to the metal ion. Note that in each case a chelate is formed.
Ion Binding to Proteins
113
there are no cases where water is part of a four-coordinate arrangement around a magnesium cation. There is, however, a high frequency of crystal structures in which the magnesium cation is surrounded by six water molecules to give a Mg(H20)^+ cation. This is in line with the low rate of water exchange of magnesium (Frey and Stuehr, 1974). Calcium ions, like magnesium ions, tend to gather hard oxygen atoms around them. Because they are larger, they have coordination numbers of 7 or 8, rather than 6 as found for magnesium ions. Calcium ion-ligand distances range from 2.1 to 2.8 A. When the coordination number is six, the arrangement is octahedral; when it is seven, the arrangement of ligands approximates those of the corners of a pentagonal bipyramid with five oxygen atoms in the equatorial plane, whereas eight ligands are arranged at the corners of a square antiprism (which is like a cube with the top face rotated 45°) (Strynadka and James, 1994). Hydroxyl groups from serine, threonine, or tyrosine side chains are rarely seen in calcium ligands in proteins. In all cases, the geometry is far from perfect. Generally there is a main-chain carbonyl oxygen atom and often a water molecule in the site. Aspartate and glutamate side chains are common ligands, presumably because of their charge. None of the six-coordinate sites have a bidentate ligand, but most with a coordination number of seven do (generally a bidentate carboxylate side chain). Those sites in proteins with the highest affinity for calcium have zero, or perhaps only one, water molecule—never more (Strynadka and James, 1994). If there is a water molecule present, it is replaced by substrate. Another factor in binding to calcium ions, however, appears to be the nature of the amino-acid residues in the second coordination sphere and possibly even further away from the calcium ion. One structural motif in proteins that has a strong affinity for calcium ions is the "EF hand" structure, a helix-loop-helix motif that has approximately a pentagonal bipyramidal arrangement as shown in Figure 8, giving a coordination number of 7 to the calcium ion (Kretsinger and Nockolds, 1973; Forsen et al., 1993). All liganding atoms are oxygen. If a magnesium ion binds to such a calcium binding site (for example, when the calcium concentration is low), then the size of the metal-ion binding cavity contracts (a change of 0.25 A in the cation-oxygen distances) and one of the oxygen atoms of the carboxyl group swings away leaving a coordination number of six (Trewhella et al., 1989; Strynadka and James, 1989). Zinc ions have a greater tendency than magnesium or calcium to bind to softer atoms such as nitrogen or sulfur. When they bind to oxygen atoms, their coordination number is six, but when they bind to sulfur, it is four. Analyses of zinc binding suggest two types of zinc, catalytic (directly involved in the catalytic mechanism) andnoncatalytic(ValleeandAuld, 1990;Christianson, 1991; Jaffe, 1993).Histidine is the most common ligand of catalytic zinc, with ligand preferences that follow this order: histidine greater than carboxylate greater than sulfhydryl, phenol (tyrosine), lysine, or backbone carbonyl (Jaffe, 1993). Noncatalytic zinc has a greater tendency to bind tetrahedrally to at least two and often four cysteine ligands, as found in zinc finger proteins (Parraga et al., 1988; Pavletich and Pabo, 1991). The
JENNY P. GLUSKER
114 aspartate
1 main chain C = 0
glutamate
carboxylate or main chain C=0
(a)
water other
Aspl29
His 135
GIul40
Aspl31
Figure 8. (a) Binding to a calcium ion in an EF hand protein. The coordination number is 7 in this type of protein fold. The numbers ( 1 , 3, 5, 7, and 1 2) refer to the sequence of the amino acid along the protein, (b) The surroundings of a calcium ion in calmodulin (Rao et al., 1993). This diagram was drawn by the program ICRVIEW (Erlebacher and Carrell, 1992). In this and similar diagrams, nitrogen atoms are filled circles and oxygen atoms are stippled circles.
zinc finger contains a P-ribbon-turn-a-helix motif that binds a zinc ion by way of four sulfur atoms (from cysteine residues) or two sulfur and two nitrogen atoms (from histidine), as shown in Figure 9. This binding site is, however, not unique to zinc ions. For example, such a site with bound divalent iron is found in rubredoxin (Watenpaugh et al., 1980). In contrast to the role of magnesium in binding six oxygen atoms, zinc more readily binds four, five, or six oxygen, nitrogen, and sulfur atoms. This is shown in Figure 10(a) and (b) for an analysis of magnesium and zinc compounds in the Cambridge Structural Database (Allen et al., 1979; Bock et al., 1995). Ab initio molecular orbital calculations on the hydration of magnesium and of zinc (Bock et al., 1994; Bock et al., 1995) show that the energy penalty for changing the coordination number of zinc surrounded by water is negligible compared with the situation for magnesium (Figure 10c). Thus zinc ions can readily
Ion Binding to Proteins
115
(a)
X3-Cys-X2_4-Cys-X12-His-X3_4-His-X4
\
/ Hisl25
Cysll2
Figure 9. The coordination of zinc in a zinc finger (a nucleic acid binding motif), (a) The amino-acid sequence. X = any amino acid, (b) The protein folding to make zinc-binding site. Zn + is found to be tetrahedrally bound by two cysteine and two histidine residues, or by four cysteine residues, (c) The surroundings of a zinc ion in the Zif268-DNA complex (Pavletich and Pabo, 1991).
change their coordination number without a large energy cost, which makes them more useful than magnesium ions for carrying out biochemical reactions. Copper ions bind softer ligands such as nitrogen and sulfur atoms and also, in the divalent state show a Jahn Teller effect (Jahn and Teller, 1937) caused by the d9 valence structures of Cu2+. This means that, while four Cu2+—O distances lie in the range 1.96-1.99 A in copper sulfate pentahydrate and trihydrate (Bacon and Curry, 1962; Zahrobsky and Bauer, 1968), the other two (trans to each other) are 2.40-2.45 A. This can be described as a tetragonally distorted octahedral coordination. Square planar and square pyramidal geometries of Cu(II)-peptides are also found in crystal structure determinations. Both Cu(II) and Ni(II) induce deprotonation of amino
JENNY P. GLUSKER
116
•
Mg-S
D
4
(a)
Mg-0
E
E3 Mg-CI
[
£ £
Mg-N
D
5
J
1=,
!_
6
7
Mg-Br
Mg*+ coordination number 100
(b)
4
5
6
7
Zn^+ coordination number
{continued)
Figure 10. Atoms that are liganded to (a) magnesium and (b) zinc ions. Data are obtained from the Cambridge Crystallographic Database. The vertical axis shows the percentage of binding for a given coordination number, (c) Energies from ab initio calculations (relative to a zero value for the hexaaquated ion) for zinc and magnesium ions with 4, 5, and 6 water molecules around them, (d) The effect of a zinc ion in polarizing a carbonyl group or a water molecule.
groups better than zinc does. The binding of Cu(II) to the imidazole nitrogen of histidine is strong. Iron is found in biochemical systems in the divalent and the trivalent states. The trivalent form is hard while the divalent form is borderline and readily forms complexes with sulfur as well as with oxygen and nitrogen. Iron-sulfur complexes are found in many enzymes, for example, aconitase (Lauble et al., 1992). In addition, Fe-O-Fe complexes are found in enzymes such as hemerythrin. Iron plays a very important role when bound to porphyrins in the heme proteins where
117
Ion Binding to Proteins
© Mg + aq4 2aq
o d C +
3 ^ Mg 2+ aq 6
0.0 Zn + aq4 2aq
Zn2+ aq5 aq
Zn2+ aq6
inner/outer sphere water
(c)
Zn2
?n
6 1 6+
2+
Zn
Zn 2 +
A
+
(d)
Figure 10. Continued
H+
118
JENNY P. GLUSKER
its valence state can change from the smaller ferric ion to the larger ferrous ion, an event that acts as a trigger of major events in the proteins such as hemoglobin. Ammonium ions tend to bind in the same way as metal cations and in fact can be replaced by rubidium ions, which have the same overall size. The ammonium ion will, however, tend to form hydrogen bonds to its surrounding groups, unlike the rubidium ion, which binds by purely electrostatic interactions. The binding of such organic cations is important in the body, and the binding rules may be complicated. For example, Taylor (1995) has noted that positively charged nitrogen atoms (as in lysine) will pack against the aromatic groups in tryptophane side chains. When metal ions bind, they often have a profound effect in the groups they bind to. The effect of zinc on the pKa of water is to decrease it to a value near 10 in hexaaquozinc ions (Bertini et al., 1990). Thus, when a zinc ion binds water, it makes it easier for the water to lose a proton and form a hydroxyl group (which is a strong nucleophile) (Figure lOd). Similarly, when zinc binds to a carbonyl group, it "polarizes" it so that there is a partial positive charge on the carbon atom and a partial negative charge on the oxygen atom. This will make the carbonyl group more amenable to attack on the carbon atom by a nucleophilic agent.
EXAMPLES OF CATION BINDING IN PROTEINS Metal ions can serve to align substrates in the active site of an enzyme and they may also polarize the substrate so that a reaction can be effected by other side-chain groups acting on the substrate. This is shown by the enzyme mandelate racemase for which the crystal structure has been reported (Neidhart et al., 1991). As diagrammed in Figure 11, the enzyme-bound magnesium ion binds to the ahydroxy group of the mandelate ion and firmly positions the anion so that an a-hydrogen atom (on the hydroxyl-bearing carbon atom) can be extracted and replaced from the opposite side to cause racemization. The magnesium ion probably also polarizes the hydroxyl group and so aids in the reaction. Another example of the binding of ions in the active site of an enzyme is provided by X-ray crystallographic studies of the zinc-containing enzyme carbonic anhydrase. The metal ion binds to three histidine residues (94, 96, and 119) and to a water molecule. The positive charge on the zinc ion makes the water molecule more acidic than usual so that it can serve as a source of hydroxyl groups at pH values below neutral. The metal-bound hydroxyl group then behaves as a powerful nucleophile (Figure 12). This ionization of water is facilitated by general base catalysis by a hydrogen-bonded network involving the hydroxyl group, threonine 199, and glutamic acid 106. Thus the H 2 0/OH" group is hydrogen-bonded to the hydroxyl group of Thrl99, which is hydrogen-bonded to Glul06. This orients the hydroxyl group so that there is no impediment to the binding of carbon dioxide. Anions that bind and perturb this hydrogen-bond network will inhibit the enzyme.
Ion Binding to Proteins Lysl66
Mg2+-
AspI95 0
(a)
0 Glu221
0 GIu247
0H2
Figure 11. Binding of magnesium ions and substrate (mandelate) to the enzyme mandelate racemase. (a) Diagram of the active site. The role of the metal ion is to hold the substrate in place so that the two side chains (His297 and Lysl 66) are positioned to attack it in one or other direction. The metal ion probably also helps polarize the substrate to aid in the catalytic activity of the enzyme, (b) View of the crystal structure with bound inhibitor atrolactate, which mimics the substrate mandelate (Landro et al., 1994).
JENNY P. GLUSKER
120
His 3 Zn 2+ • - - O — C I H
^
^
\ Or O
His 3 Zn 2+ - •-OH
+
H+
+
O H
C O
Figure 12. Mechanism of action oi carbonic anhydrase. The zinc ion activates a water molecule to give a hydroxyl group that can then attack the carbon atom of carbon dioxide.
Metal substitution in the active site of carbonic anhydrase has been studied. Divalent cobalt ions will also work in the hydration of carbon dioxide, but the cadmium- and manganese-substituted enzymes are inactive. This is presumed to be due to a tendency for cadmium and manganese to have a coordination number of five compared with the value of four for zinc or divalent cobalt (Garmer and Krauss, 1992). This has been verified by crystal structure analyses of divalent cobalt, copper, nickel, and manganese enzymes (Hakansson et al., 1994a). The cobalt-substituted enzyme has a tetrahedral coordination around the metal ion at pH 6 and pH 7.8. The copper enzyme has five groups around the metal, four of which are essentially in a plane and with a histidine side chain in the fifth position. The nickel and manganese enzymes have six-coordinated metal ions, with the ligand arranged in an approximately octahedral arrangement. This increased coordination number makes it much more difficult for a carbon dioxide molecule to bind in the correct way in the active site and hence explains the inhibition of enzymes by these metal ions. Apparently this spatial problem is more important than any polarizing effect of the metal ion on the hydroxyl group. Thus the relatively strongly bound water molecule in the fifth coordination position in the metal-inhibited enzyme acts as an inhibitor of the enzyme. The authors state that "Zinc is probably the best candidate both in its coordination chemistry, its natural abundance, and its weaker interactions with anions and other possible ligands that might disturb the enzyme activity in vivo" (H&kansson et al., 1994a). Metal ions may also bind different parts of a molecule or two different molecules together. For example, integrins are involved in adhesion and have a ligand-binding function that may be controlled by a carefully engineered metal ion coordination. The crystal structure of the ligand-binding domain of Mac-1 integrin (Lee et al., 1995; Graves, 1995) indicates that Mg2+ or Ca2+ can serve to bind the protein to its
Ion Binding to Proteins
121
receptor. The metal ion-binding site is at the carboxyl-terminal end of a P-sheet, and the binding appears to involve serine hydroxyl-magnesium, as well as carboxy 1-magnesium interactions. Significantly for cell adhesion, this binding brings different molecules together. Azurin is an electron transfer enzyme that contains copper ions that change their oxidation state between Cu(I) and Cu(II). The question was asked how much the shape of the metal-binding site was determined by the copper ions and how much by the protein folding, which prepares a site that copper readily binds to. The copper site is trigonal bipyramidal. It consists of two histidine residues (His 46 and 117), one cysteine residue (Cysl 1-2), and longer axial bonds to the thioester sulfur atom of Metl21 and a peptide carbonyl oxygen atom from Gly45. The crystal structure of apo-azurin has been determined (Shephard et al., 1993). This shows that the ligand side chains move slightly on removal of the copper ions so that the radius of the copper-binding cavity is slightly reduced (from 1.31 A in reduced azurin to 1.24 A in oxidized azurin to 1.16 A in apo-azurin). This shows that the protein folding has defined the metal-binding site. When cadmium ions were soaked into apoazurin, the crystal structure showed that the cadmium had replaced the copper ions with very little perturbation. The coordination geometry is slightly more tetrahedral than in the copper enzyme (Blackwell et al., 1994). In zinc-azurin and a zinc azurin mutant (Asn47Asp) structure, the coordination of the zinc is approximately tetrahedral (Sjolin et al., 1993). The methionine substituent is no longer part of the coordination sphere. This shows that small changes can occur in the active site in response to the binding of different cations. Sometimes a protein contains a prosthetic group that has a metal-binding site. An example is provided by the pyrrolo-quinoline quinone (PQQ) prosthetic group of the enzyme methanol dehydrogenase, which oxidizes methanol to formaldehyde (Ghosh et al., 1995). As shown in Figure 13, PQQ binds calcium by a direct (shared) carboxyl group of Glul77, the amide oxygen atom of Asn261, the C5 quinone oxygen atom, one oxygen atom of the C7 carboxylate group, and the N6 ring nitrogen atom of PQQ. It is suggested that the carbonyl group on PQQ that binds to calcium ions facilitates attack on the substrate. The other quinone oxygen group, marked in Figure 13, may be in the free-radical semiquinone form. A metal-utilizing enzyme will bind a substrate molecule, taking up one or two sites on the coordination sphere of the metal ion. For example, there are two zinc ions bound to the catalytic domain of human fibroblast collagenase (Borkakoti et al., 1994). In this crystal structure the zinc ion in one site is surrounded by three histidine residues and one aspartic acid group, while in the other site the zinc ion is bound to three histidine groups and two oxygen atoms from the hydroxamate group of a synthetic inhibitor. Of great interest are those enzymes that bind two different metal ions. Copper, zinc superoxide dismutase provides an excellent example (Tainer et al., 1982). Another is pyruvate kinase, which binds magnesium and potassium (Muirhead et al., 1986). Concanavalin A is a saccharide-binding protein (lectin) from the Jack
JENNY P. GLUSKER
122 Thrl5J Ar<j|09
Argl09
Thrl53
Asn387
Arg324
Figure 13. The binding of a calcium ion to the prosthetic group in methanol dehydrogenase, (a) Line diagram and (b) view from crystal structure (Xia et al., 1996).
Ion Binding to Proteins
123
bean. One metal site binds nickel, cobalt, zinc, manganese, and cadmium, while the other site binds calcium or cadmium (Emmerich et al., 1994). Thus it has a transition-metal binding site and a calcium-binding site. In cadmium-substituted concanavalin A, it is found that the cadmium has bound to the transition-metal site and that calcium is still present. This cadmium complex has a third octahedral cadmium-binding site that binds two monomers together (Figure 14) (Naismith et al., 1993). The large proteolytic fragment (the Klenow fragment) of Escherichia coli DNA polymerase I binds both zinc and magnesium ions (Beese and Steitz, 1991). One metal binding site consists of three carboxylate groups (Asp355, Glu357, aid Asp501), the 5'-phosphate group of the deoxynucleoside monophosphate that i^ Transition-metal site
Glu8 monodentate
monodentate A s p l 9 .
• Asp 10 monodentate ***;Mn 2 + r
His24 Ne2 (a)
Calcium site
Asn 14 monodentate Tyrl2 0 .
(b)
;
,. Asp 19 monodentate
Asp 10 bidentate
Additional cadmium site monodentate Glu87
(c)
Asp 136 mono/bidentate?
Glu 183 bidentate
Figure 14. Metal binding to concanavalin A. (a) The transition-metal binding site, (b) The calcium binding site, (c) An additional site where cadmium binds.
124
JENNY P. GLUSKER
also complexed with the enzyme, and a water molecule. This is the zinc-binding site. The other metal site is octahedrally coordinated to the other oxygen atom of the carboxylate of Asp355, two of the 5'-phosphate oxygen atoms of the deoxynucleoside monophosphate, and three water molecules. This is the magnesium-binding site (Figure 15). The identities of the metal ions in the two sites were determined by comparing the electron density maps obtained when only zinc ions were soaked into the crystal and when a mixture of both magnesium and zinc ions was soaked in. The latter map showed less electron density in the magnesium site. An example of a successfully engineered metal binding site has been described for alkaline phosphatase. This enzyme, so-called because it has a pH optimum above 7.0, is a phosphomonoesterase that acts via a covalent phosphoseryl intermediate and catalyzes the phosphoryl transfer reaction to various alcohols. When a phosphomonoester binds in the active site of this enzyme, a serine hydroxy 1 group (Serl02 in the Escherichia coli enzyme) becomes covalently phosphorylated. The product alcohol dissociates from the phosphorylated enzyme, and water then binds in its place interacting with the phosphoryl group and regenerating free enzyme. Alkaline phosphatase is a dimeric metalloenzyme with two zinc ions and one magnesium ion at each active site (Figure 16). The crystal structures of the enzyme and the cadmium-substituted enzyme have been determined (Sodwadski et al., 1983; Kim and Wyckoff, 1991). A phosphate group in the crystal structure is shown to be bound by arginine side chains and the two zinc ions. The magnesium ion is bound by two carboxyl groups, three water molecules, and the hydroxyl group of Thrl55. One metal-binding site of this protein has been altered in the Escherichia coli enzyme. The replacement of the aspartic acid side chain at position 153 that binds
Figure 15. Binding of metal ions to the Klenow fragment (a portion of DNA polymerase I). The magnesium ion binds the phosphate group on DNA while the zinc ion polarizes a water molecule, which then attacks the phosphorus atom.
125
Ion Binding to Proteins E. coli alkaline phosphatase (wild-type) Asp-51
Glu-322
To be activated (phosphorylated)
Figure 16. The active site of the E. coli enzyme alkaline phosphatase. There are one magnesium and two zinc sites.
both a magnesium-bound water molecule and Lys328 in the wild-type enzyme, by histidine, converts the site from a magnesium-binding to a zinc-binding site. This mutant enzyme, which is inactive, binds three zinc ions compared to the two zinc ions and one magnesium ion that are bound in the wild-type enzyme (Murphy et al., 1993). The third zinc ion is now four-coordinate (two carboxyl groups, a threonine hydroxy 1 group, and His 153) as shown in Figure 17; the two water molecules that were bound to the magnesium ion are lost. Thus the octahedral magnesium-ion binding site has been converted by this change in only one amino acid side chain into a tetrahedral zinc-ion binding site with histidine as one of the ligands. This means that Lys328 is no longer constrained in the mutant (D153H) enzyme and therefore is free to bind to the phosphate group as shown in Figure 17. Both of the conserved zinc ions bind only one oxygen of the phosphate group, unlike the case for the wild-type enzyme where these two zinc ions bind different oxygen atoms on the phosphate group. The binding of the catalytically important Serl02 hydroxy 1 group to the second (conserved) zinc ion also decreases the activity of the enzyme so that it is no longer active. When excess magnesium ions are added, the enzyme regains its activity. It was unfortunately not possible to crystallize the mutant enzyme in the presence of excess magnesium ions, so that no structure determination of this form of the mutant enzyme has been possible. Presumably, however, the magnesium ions can select out six oxygen-containing
JENNY P. GLUSKER
126
M2 site Lys-328 Loses constraint by Asp-153
Ml site
• / Zn2+
Z n 2-
\ /Good y \ binding *"0
\
.-NH
"^
Arg-166
Figure 17. The active site of an inactive mutant (Asp153His) of E. coli alkaline phosphatase. The third metal-binding site in this mutant enzyme binds zinc rather t h a n magnesium.
ligands in the mutant enzyme, and this may affect the interactions with the second zinc ion.
ION MIGRATION IN PROTEINS CONTAINING MORE THAN ONE METAL Often when one metal ion is replaced by another of similar size and charge, the enzyme is still active. For example, Zn(II) can be replaced in many enzymes by Co(II), Mn(II), or Cd(II) with only a small loss of activity. Other metal ions such as Pb(II) that bind in the same site may completely inhibit the enzyme. In alcohol dehydrogenase, for example, the Zn(II) may be replaced by Co(II), Ni(II), or Cu(II), with enzyme activities [as percentages of the Zn(II) enzyme] of 70, 30, 12, and 8, respectively. Metal ions compete with each other for metal-binding sites on proteins. The process is dynamic and depends on the availability of the various ions. Such hopping from one site to another has been followed spectroscopically in the enzyme D-xylose isomerase, as diagrammed in Figure 18a (Sudfeldt et al., 1990). This enzyme is a tetramer, and contains two metal-binding sites in each of the four subunits. The amino-acid side chains that bind these two metals are different. One site involves four carboxylate groups and two water molecules, and this is where the substrate binds displacing the two water molecules. The other binding site
127
Ion Binding to Proteins
Asp245
Q
0 Glu2l7 GluISI .--OH2
M2*--
O--.
-OH;
""•OH,
A site ( W
v
B site
Hish affinity tor M2(II). EuHII). Sm(III)
Hish affinity for Co(II), Cd(II). Pb(II). VO(II)
Asn220
His220
WiK
Q
\^ (b)
Mnz
H
Mnz (continued)
Figure 18. The active site of the enzyme D-xylose isomerase (CarrelI et al., 1989). (a) The groups binding the two metal ions and their preferences for the A and the B site. The substrate displaces the two water molecules in the A site, (b) Metal binding to a histidine group in the wild-type enzyme and to asparagine via a water molecule in the His220Asn mutant enzyme (Cha et al., 1994). (c) View of the active site of the wild-type enzyme.
JENNY P. GLUSKER
128
Figure 18. Continued
involves three carboxylate groups (one of which binds in a bidentate manner), one histidine residue, and one water molecule. For full activation of the enzyme, both sites have to be filled with the correct metal ion. The presence of a histidine side chain changes the nature of the binding site and attracts transition metal ions rather than magnesium, which binds well in the other site. The metal-binding site involving a histidine residue is called site B, while site A has no histidine groups binding the metal ion—only carboxylate groups and water. The metal ion used as a spectroscopic marker was divalent cobalt. When Co(II) is added to apoenzyme, the first four cobalt atoms go equally into both A and B sites. After equilibration, the next four cobalt ions occupy the A site. Spectroscopic experiments show that the highest activity is found when Co(II) occupies the A site. Excess Co(II) replaces Mg(II) in the A site and decreases the activity. Pb(II) and Cd(II) are strong inhibitors. Spectroscopic studies indicate that Cd(II) or Pb(II) can replace Co(II) in the B site and released Co(II) goes to the A site. Substrate will still bind in the 4Co/4Pb enzyme. Many studies have been made on the binding of different metal ions to the enzyme D-xylose isomerase (Whitlow et al., 1991; van Tilbeurgh et al., 1992; Jenkins et al., 1992). Protein engineering methods can be used to prepare mutant enzymes. One in which the histidine (220) in the metal-binding site is replaced by asparagine shows that the space between the metal ion and the smaller asparagine residue is filled by a water molecule. This mutant enzyme is, however, no longer appreciably active (Figure 18b) (Cha et al., 1994).
Ion Binding to Proteins
129
ANION BINDING TO PROTEIN FUNCTIONAL GROUPS Negatively charged atoms are generally larger than cations. They are often made up of several atoms covalently bonded together as in phosphate, sulfate, and other groups. For example, sulfate and phosphate groups consist of a tetrahedral arrangement of oxygen atoms at 1.5-1.6 A around the central atom. This means that the cavity for binding of such an anion must generally be larger than that for a metal ion. Anions such as phosphate, sulfate, acetate, and nitrate have oxygen atoms on their periphery. Therefore the binding cavity in the protein must be lined by positively charged groups such as metal ions or by hydrogen-bonding groups. Examples are provided by the binding of sulfate groups in the crystal structures of two different proteins, a sulfate-binding protein, and 6-phosphogluconate dehydrogenase (Pflugrath and Quiocho, 1985; Phillips et al., 1995) (Figure 19). In the latter protein, there are several sulfate-binding sites, two of which are in the active site. The strongest binding occurs at the site that would normally bind phosphate. The carboxylate-binding site will also bind sulfate, but not so tightly and with several water molecules involved. Sometimes in order to bind such ions, the enzyme must be previously desalted as was found for sulfate binding to lysozyme (Ries-Kault et al., 1994). This enzyme will not crystallize the presence of high sulfate concentrations unless desalted first by strong cation and then anion exchange resins to give a protein solution with only hydrogen and hydroxyl ions as counterions suitable for the addition of required ions. In an interesting analysis, it has been shown that the ionization state of a carboxyl group can be inferred from the number of hydrogen bonds made as shown in Figure 20 (Ramanadham et al., 1993). In this example involving lysozyme, the carboxyl group of aspartic acid 52 forms four hydrogen bonds and is therefore assumed to be ionized, whereas glutamic acid 35 forms only two hydrogen bonds and therefore is considered to be un-ionized. These are the two active-site residues that are situated on either side of the glycosidic bond that is cleaved by lysozyme, and the ionization state of these two acids is an important feature of the catalytic mechanism. Several analyses of intermolecular contacts of the type described earlier for carboxylate-metal ion binding have been made from analyses of the Cambridge Crystallographic Database, (Cody and Murray-Rust, 1984). The most likely site for anion binding is near a positively charged center such as a metal ion. This has been shown for a series of substrates binding to the enzyme D-xylose isomerase (H.L. Carrell et al., 1994). The various sugars with many hydroxyl groups bind to the metal ion and span the active site to also bind to a histidine residue. When an ionized analogue is bound, as in threonate, the charged carboxylate group lies near the metal ion. In alkaline phosphatase (Kim and Wyckoff, 1991), described earlier, the phosphate group is held by two zinc atoms and by the two terminal nitrogen atoms of Argl66. This enables the enzyme to hold the phosphate group firmly in place as shown in Figure 21. In the cadmium-substituted enzyme, two of the oxygen atoms
130
JENNY P. GLUSKER Serl30 N3H
Aspll backbone
NH /
NH
V y v
\ Alal73 NH* backbone/
\
\ /
-
-HN
/
/
**NH \
Glyl32 backbone
Trpl92
(a) Serl30
Ser45
Glyl31
Figure 19. The binding of sulfate groups in (a) a sulfate-binding protein (Pflugrath and Quiocho, 1988), and (b) and (c) 6-phosphogiuconate dehydrogenase (Phillips et al., 1995). In (b) this site is normally filled by substrate phosphate and in (c) it is normally filled by the carboxylate group of the substrate.
of the phosphate group are bound to one cadmium ion, one oxygen atom of which also binds the other cadmium ion. The other two oxygen atoms of the phosphate group are still firmly held by Argl66. These observations led to suggestions for the nature of the transition state of the reaction that is catalyzed (see Figure 22). When NADPH is bound in an enzyme, its phosphate groups tend to be bound to water
131
Ion Binding to Proteins
T
%, Arg287
NH
N
^2
rl91
OH
NH2.
*
/
/Lys260 HN backbone
"~"0
6
/
\ , S ^ (phosphate site)
,OH2 % H N
.NH2
P' GIul90 / '
/
NH 2 'O
/
1
\ \ ^ / N | His452
I
Arg446
GIul90
Figure 19. Continued
and to arginine side chains as illustrated in Figure 23 (Al-Karadaghi et al., 1994). In trypanothione reductase (Bailey et al., 1984), the phosphate groups of the cofactors are bound to water, the hydroxyl groups of tyrosine, and backbone amino groups. In the di-iron center of methane monooxygenase (Swindells, 1994), an acetate group is directly coordinated to both iron atoms. This binding is similar to the aspartic acid binding in hemerythrin (Figure 24).
132
JENNY P. GLUSKER
is 186
GlyI29
\
/ NH
GlyI30\ NH""
Serl28 H20
,.H2d, .*
o (carboxylate site) NHi""
HO"
...
/ 0
-H 3 N
.Sv
...-0
Lysl83
0-. *;H 2 O
0H
2
H2ISL
Asnl87
I Asnl02
(c)
Figure 19.
Continued
OH 2
H2p
,0.. T)H 2
NH 2 Asn59
Ionized (4 H bonds)
Asp52
(a)
0H2
AlaJlO backbone
(b)
HN'
V Not ionized (2 H bonds) Glu35
Figure 20. The use of hydrogen bond counting to estimate the ionization state of a carboxyl group. The example given here involves lysozyme. When there are many hydrogen bonds or metal-binding sites, it can be assumed that the carboxylate group is ionized. When there are few hydrogen bonds (two), the carboxylic acid is not ionized.
Ion Binding to Proteins
133
Zn J+ -.. -H,N -NH -H2N Zn2
(a) Cd2+.
X
Cd 2 + -;
-H,N -NH -FUN
\
/
(b)
Figure 21. The binding of zinc and cadmium ions to alkaline phosphatase. Note that cadmium forms a bidentate interaction with the phosphate group.
active
inactive 2 OH-Ser
O-
«..
Zn 2 +>-
0
O
0-
0HZn"
-OH-Ser
Zn2+-
OH-
1 (a)
(b)
Figure 22. The inferred transition state of the action of alkaline phosphatase, based on the data in Figure 2 1 . (a) The active and (b) the inactive forms.
JENNY P. GLUSKER
134
\
C o
/
0H 2
P I O 0
(a)
h /
P"
O--. / " - - H Arg369 N HN Arg47 N /'H2N
NHv'
H
H20'
Asp223
Figure 23. The binding of an NADPH cofactor pyrophosphate group to liver alcohol dehydrogenase, (a) Diagram and (b) crystal structure (Al-Karadaghi et al., 1994).
Ion Binding to Proteins
135
methane monooxygenase acetate CH 3 Glu243 O
H20
OL
Glull4 C .
Glu209
.-O Fc-
*"Fe„ "O"'
Aspl08 N HislOl
H20 N-.. /'' \
V. I
(b)
I
Fc"
***,F,e
His25
His54
O
His73
\ NH His77
Figure 24. (a) Acetate binding to methane monooxygenase, compared with (b) the binding of an active-site aspartate in hemerythrin.
Many monovalent anions are inhibitors of the carbonic anhydrase reaction and they either displace the zinc-bound water molecule or they bind near the zinc and increase its coordination number. Examples of inhibitory anions are iodide (I"), and aurocyanide [Au(CN)p]. The crystal structure of human carbonic anhydrase complexed in different crystals with these two inhibitors has been determined (Kumar et al., 1994). In this enzyme, the iodide ion replaces the fourth coordination position on the zinc ion that is normally occupied by the H 2 0/OH" ligand. Otherwise the
136
JENNY P. GLUSKER
distorted tetrahedron around the zinc ion is not perturbed, but the product is inactive. The complex has a Zn2+— I" distance of 2.7 A as shown in Figure 25 (Kumar et al., 1994). The Au(CN)2 group does not bind in the same way. Instead of displacing the H 2 0/OH" ligand, it forms a hydrogen bond to the H 2 0/OH" group. The N = C - A u group is bent by 13°. The hydrogen bond that it forms to the metal-bound hydroxyl group will prevent the latter from forming a hydrogen bond to the hydroxyl group of Thrl99. The hydrogen atoms on the hydroxyl group then point towards the substrate (carbon dioxide) binding site thereby interfering with substrate binding
Zn Zn Zn
P 2.7 A Au 6.1 A N 3.4. 9.2 A
His96
H20/OH" , ^..2.9 A C
aurocyanide Au
\
N
Figure 25. (a) Diagram of binding of aurocyanide, water, and iodide to carbonic anhydrase. Some metal ion-ligand distances are listed, (b) Crystal structure (Kumar et al., 1994).
Ion Binding to Proteins
137 H20/OH"
7„2+ .-Zn
His94Ne2'''
(a)
His96Ne2
Hisll9N5l
Thrl99 "H 3.0 A
O >
:c
H2
°^,2.lA
CH3
'2.4 A
2.2 A ^ ; z n ? t
2.i A
His94Ne2-'
* Hisll9N6l
/ 2.2 A (b)
Thrl99
His96 NE2
\
N
_
3.0 A H
Qv ,C
CH 3
"^Zn2*
2Aky'' His94Ne2''' (c)
\
^..2.2 A
'. 2.2 A
*^
Hisll9N6l
His96 Ne2
Figure 26. Binding of (a) water/hydroxide, and (b) acetate to carbonic anhydrase. Also shown in (c) is the binding of acetate to a mutant enzyme.
JENNY P. GLUSKER
138
\
/
Argl74
W^N
/ H^
\
Gln200 backbone
\ O
»S. O-
0'
I
H O
GDP
6
II
Thrl77 backbone
(a)
X Glu43 backbone
/
Lvs46
\
NH
-H,N^-
/ HCT Scr47
NH 2 HN
. ^ - ^ - - 0 H NH 2
Aral78
-F--
"**F HO OH2
ThrlSI
\
/ (b)
2
\
GK203 backbone
(continued) Figure 27. Binding of (a) Y-thioGTP, and (b) aluminum fluoride to the Giocl protein, (c) Crystal structure of aluminum fluoride binding (Coleman et al., 1994).
139
Ion Binding to Proteins
(c)
GIn204 Figure 27.
Continued
and inhibiting the enzyme reaction. This reorientation of the hydrogen atom of the hydroxy 1 atom is what the hydroxyl-Thrl99-Glul06 is believed to prevent in the normally functioning enzyme. This hydrogen bond network also facilitates nucleophilic attack of the oriented lone pair on carbon dioxide, the substrate. In acetate complexes of human carbonic anhydrase II and its E106Q mutant, it is found that acetate is bound (Hakansson et al., 1994b). The zinc ion is bound to three histidine groups, one water molecule, and one oxygen atom of the carboxyl group of acetate. There is a need for enzyme activity to exclude atoms that are not hydrogen bond donors from coordinating to the zinc-bound water by virtue of the Glul06-Thrl99 hydrogen bond network. In the wild-type structure, acetate binds in what is called the carboxylate site. The hydrogen bond acceptor Thrl99 hydroxy 1 group prevents the carboxylate oxygen atom from entering the zinc-water position. In E106Q, the hydrogen bond network is reversed and the hydroxy 1 group on
JENNY P. GLUSKER
140
-CH ooc A
/ H ^CH-
\
(b)
HO-
/
/
\
fluorocitratc
OOC
(c)
Figure 28. Binding of (a) citrate, (b) isocitrate, and (c) fluorocitrate to metal ions. The two prochiral -CH 2 -COO- groups of citrate are labelled A and B to show that the binding of fluorocitrate is in the opposite direction to that of isocitrate in the enzyme aconitase.
Thrl99 can act as a hydrogen bond donor or acceptor with respect to the zinc ion. Therefore the acetate can bind carboxylate oxygen near the zinc-water position. This is illustrated in Figure 26. Aluminum fluoride, in the form of AlF^ can bind to the site of Ga-GDP occupied by the y-phosphate in the Ga-GTP complex (Sondek et al., 1994). When it does so, it activates the enzyme. The structure of transducin oc-GDP (at 1.7 A resolution) activated by aluminum trifluoride shows four fluoride ions in an octahedral plane around the aluminum ion and two apical oxygen atoms from the (3-phosphate group of GDP and from water, respectively. The GDP-0-(AlF^-H 2 0) complex is held firmly in place by the two terminal nitrogen atoms of an arginine residue (Argl74), which forms hydrogen bonds to one fluoride ion and the oxygen atom linked to the GDP. Thrl77 and Gln200 help stabilize the water molecule bound to the A1FO". As shown in Figure 27, a calcium ion also binds to two of the fluoride ions indicating
Ion Binding to Proteins
141
other interactions in this region of the enzyme-bound GDP molecule. This crystal structure led to a suggested mechanism of action of the enzyme (Sondek et al., 1994). A word on fluorine and fluorides seems appropriate here. Confusion exists on the role of this, the most electronegative element, that often appears fairly inert when covalently bound to carbon. However, when there are activating groups nearby in the molecule such as the carboxylate group of trifluoroacetate the anion shows its own idiosyncratic model of binding. This is particularly true for fluorocitrate, which has been suggested not to bind to the enzyme aconitase in the same way that citrate does (H. L. Carrell et al., 1970). The tendency for fluorine in the C-F bond to bind to the coordination sphere of metal ions has been shown by X-ray diffraction studies (Murray-Rust et al., 1983). This may explain the highly poisonous character of one isomer of fluorocitrate, one that has the fluorine atom on the arm of the molecule that, in citrate, would not be acted on by the enzyme. Thus, addition of a fluorine atom to citrate has made it bind the "wrong way" round as shown in Figure 28 (H. L. Carrell et al., 1970).
METHODS OF PREDICTION OF ION-BINDING SITES The overall picture that emerges is that ions bind readily to an appropriately sized cavity lined with several groups carrying the opposite charge. Metal ions bind well to negatively charged groups such as carboxylate groups, and it has been shown that an inviting cavity is produced in D-xylose isomerase in order to bind metal ions. Similarly, anions bind in cavities that have plenty of groups (preferably positively charged) that bind them, generally by hydrogen bonding (i.e., binding to hydrogen ions). In conclusion, some methods currently in use for identifying such sites in proteins will be described. Several computer programs have been written in order to allow for the prediction of ion-binding sites in proteins. One of the best known is the program GRID (Goodford, 1985; Boobbyer et al., 1989; Wade and Goodford, 1993; Wade et al., 1993), which has been successfully used to determine optimum sites on proteins for the binding of specified ions or functional groups (chosen by the investigator). An empirical energy function is used to calculate the interaction energy between a chemical probe such as a water molecule and the target molecule (a protein). The result is the identification of binding sites for the selected chemical probe on the surface of the target molecule. Therefore the program can be used to analyze how cations or anions would bind to a protein. The energy function used involves Lennard-Jones, electrostatic and hydrogen-bonding terms that have been derived from experimental data from crystal structure determinations. Energies are displayed as energy contours around the target molecule by use of the program FRODO (Jones, 1978). The charge distribution around metal ions is positive and approximately symmetrical. It gathers around it atoms, ions, or groups that are negatively charged, that is,
142
JENNY P. GLUSKER
electron-pair donors (Lewis bases). These electron-pair donors, in proteins, are oxygen, nitrogen, and sulfur atoms. In addition to binding metal ions, these groups also readily bind water molecules and hence may be described as hydrophilic. In the side chains of proteins, however, these electron-donor atoms are covalently bound to carbon atoms that are hydrophobic. With this in mind, the environments of metal ions in proteins have been critically evaluated by Eisenberg and coworkers (Yamashita et al., 1990). The common feature of metal-binding sites in proteins was identified by them as an area with a shell of hydrophilic groups (containing oxygen, nitrogen, or sulfur atoms) that is embedded in a larger shell of hydrophobic groups (containing carbon atoms). They described these sites as ones of high hydrophobicity contrast, that is, a rapid change from hydrophilic to hydrophobic as a function of the distance of the atom (up to 7 A) from the metal-binding site. In addition, the hydrophobic outer sphere provides an interior region of low dielectric that may serve to enhance electrostatic interactions within it. A program has been written by them to search for such areas of hydrophobic contrast and to consider them as potential metal-ion binding sites. It is presumed that the hydrophobic sphere around the hydrophilic sphere can restrict the flexibility of the metal-binding site (Serpersu et al., 1986). By Coulomb's Law, the force between two charges, qj and q2, along the line between their centers depends on the dielectric constant D and the square of the distance r between the charges F = qi q 2 /kDr 2
(1)
where k is a constant. When a charged particle (q) interacts with a dipole moment F = qji cos0/kDr2
(2)
where 0 is the angle between the direction of the dipole moment and the line joining the point charge to the center of the charge displacement of the dipole. Thus charge-charge interactions depend on the distance between them, while chargedipole interactions also depend on the distance and orientation of the two. The electrostatic potential on the surface of a protein can be calculated by classical electrostatic theory and it will give an indication of charged areas on the molecule; these would readily attract charges of the opposite sign (Gilson and Honig, 1987). The interior of a protein may be considered as a homogeneous dielectric medium that can be polarized by electric charges. The dielectric constant of proteins is low (2 to 3) because the reorientation of dipolar groups is restricted. Since water has a high dielectric constant (approximately 80), the protein-water interface is the boundary of two dielectric media. These principles can be used to derive electrostatic potentials within the protein from the calculated partial charges and the known atomic coordinates. The protein is divided into cubes, 1 A on an edge with a dielectric constant assigned to each cube (Sternberg et al., 1987). The charge
Ion Binding to Proteins
143
position and the shape of the molecule are taken into account. The result can be used to estimate pKa values and shifts. These ideas have been extended to give precise locations of calcium-binding sites in proteins (Nayal and Di Cera, 1994). They used the principle, mentioned earlier, put forward by Pauling (1929) and developed further by Brown and coworkers (Brown and Wu, 1976; Brown, 1978, 1988) that when a divalent metal ion (charge +2) binds to a protein, the total charge lining the cavity must be - 2 . This negatively charged lining is provided by oxygen, nitrogen, and sulfur atoms, each with partial or complete negative charges between 0 and -1.0. The coordination number of the cation (the number of atoms arranged around it) will depend on the relative sizes of the cation and anion. The larger the cation and the smaller the anion, the larger the number of groups that can bind around the cation. This can be analyzed in terms of the cation-to-anion radius ratio. Values of radius ratios and the most likely coordination numbers of metal ions are given in Table 6. Thus if the radius ratio lies between 0.16 and 0.24 the cation will bind four anions. Each site around the protein (calculated on a grid with a selected spacing) is checked to see if it can provide an environment with a total negative charge of -1.4 to -2.0 so that a calcium ion can comfortably bind with the expected coordination number (6 to 9). The higher the charge or partial negative charge on an anion surrounding a positively charged atom or group, the closer it will be presumed to come to the cation. As a result, some measure of the partial charge on each oxygen, nitrogen, or sulfur atom in the first coordination sphere can be obtained from the relative distances between each of these atoms and the central metal ion. The charges found by a formula of the type v = (IVR i r N
(3)
can fit to the experimental data to give values for the bond valence v between two atoms, one a metal ion and the other an atom in its first coordination sphere at an experimentally determined cation-oxygen distance R. For example, values Rl and N for various ions are as follows: 1.909 and 5.4, respectively, for calcium ions; 1.622 and 4.290, respectively, for both sodium and magnesium ions; and 2.276 and 9.1 for potassium ions (Brown and Wu, 1976). The general equation for cations is v = s(R^ l )"< a6CN+2 - 2)
(4)
where s is the average bond valence, R{ is the average bond length, and CN is a typical coordination number. For example, if a magnesium salt has Mg2+—O bond distances of 2.07, 2.13, 2.08, 2.14, 2.12, and 2.03 A, then by equation (1) each metal-oxygen interaction has a bond valence that can be calculated as -0.351, -0.311, -0.344, -0.303, -0.317, -0.382, respectively. These add up to -2.009, which will balance the charge of +2 on the metal ion. In the simplest case when a magnesium ion with a charge of +2 gathers six oxygen atoms around it, the charge on each would be expected to be -1/3 (-0.333) and all metal-oxygen distances would be equal (Figure 29). In the example given above, the distances are unequal
144
JENNY P. CLUSKER
Table 6. Radius Ratios, Ionic Radii, and Average Coordination Numbers (a) Cation-anion radius ratios Coordination
number
Coordination
Radius ratio
3
0.155
Triangle
4
0.244
Tetrahedi on
6
0.414
Octahedron
8
0.645
Square antiprism
8
0.732
Cube
1.0
Cube-octahedron
12
polyhedron
(b) Ionic radii and average coordination numbers (Brown, 1988) Cation
Cation radius, A
Average coordination
beryllium
0.31
aluminum
0.50
5.3
cobalt(lll)
0.54
5.9
4.0
chromium(lll)
0.58
6.0
lithium
0.60
5.3
manganese(lll)
0.62
5.8
iron(lll)
0.62
5.7
magnesium
0.65
6.0 5.9
nickel(ll)
0.66
copper(ll)
0.69
5.1
cobalt(ll)
0.70
5.7
zinc(ll)
0.71
5.0
tin(IV)
0.71
5.9
copper(l)
0.72
2.2
iron(ll)
0.74
5.9
manganese(ll)
0.80
6.0
lead(IV)
0.84
5.7
palladium(ll)
0.86
4.4
cadmium(ll)
0.91
6.1
sodium
0.95
6.7
mercury(ll)
0.98
5.5
calcium
0.99
7.3
silver(l)
1.10
5.1
strontium
1.13
8.6
potassium
1.33
9.0 10.2
barium
1.35
rubidium
1.48
9.8
cesium
1.69
10.4
number
Ion Binding to Proteins
145 -0.333 -0.333
^ -0.333
-0.333 ,0
equal '/
^
-''v^
V - - '
+2.00** - ^
9ul} *-o
equal (a)
-0.333
e
-0.333
-0.382 -0.351 0.
^ , -0.311
,.;->v
•>#v o-'
•
<
.0
^
• Mg 2 ;
+2.00^*-. 2.08 A
(b)
-0.317
2.03 A
-0.303 ^
-0.344
Figure 29. Surrounding of a magnesium ion. (a) All distances and ligand atom types the same, and the assumed charge on each oxygen atom is 0. 333 electrons, (b) The assessment of partial charges when the distances vary. The longer the metal ion-oxygen distance, the smaller the negative charge.
and the charge on the oxygen is slightly higher when the distance is shorter and vice versa. Another method for determining sites for metal binding on proteins involves preparing a probe of the type shown in Figure 30 and derived from the analysis of the locations of metal ions around carboxylate and imidazole groups (C. J. Carrell et al., 1988; A. B. Carrell et al., 1993). This analysis had a significant practical application by providing one method for locating metal-binding sites in proteins. It was used successfully to locate the metal ions in the crystal structure of the enzyme D-xylose isomerase (H. L. Carrell et al., 1989). A probe of the type shown in Figure 30 (a carboxyl group with a metal ion in its plane in syn, anti, or direct positions) was laid on each carboxyl group of the protein in its crystal structure. This probing was done by use of a computer graphics system and highlighted the
JENNY P. GLUSKER
146 M ,,+
Mn+
.'Ov
/
Mn+
CL *fvT
(a)
,Mn+
•••/Sr
\
C
i
C
(b)
Figure 30. Two probes that were successfully used to determine the metal-binding sites in D-xylose isomerase. (a) A carboxylate ion with metal ions in its plane and (b) a histidine side chain with two metal ion-binding sites.
metal binding because several carboxyl groups were clustered near that site. In this way it was possible to derive a new simple and general method for locating possible sites of metal binding in proteins. Klebe (1994) has used the Cambridge Crystallographic Database to map possible interaction sites in proteins and nucleic acids by deriving what he calls "composite (average) crystal-field environments" about different functional groups. Histograms are provided of the angular relative arrangement of ligand and receptor in small crystal structures listed in the Cambridge Crystallographic Database. For example, because hydrogen bonds are preferably linear, the scatter of hydrogenbond donors (DH-A-X) about acceptor groups A-X show a much broader distribution than do those of hydrogen-bond acceptors (D-H-AX) about donor functional groups (D-H). Studies have been made of the geometry of hydrogen bonding to carboxylate groups (Klebe, 1994; Pirard et al., 1995). Short linear hydrogen bonds are formed to NH groups, and the hydrogen atoms tend to form a crown around the acceptor oxygen atoms in a manner that is similar to that of sodium and potassium ions. Another method for studying potential ion-binding sites involves docking ions or molecules into the binding pocket of a protein using three-dimensional information on the most probable geometry of a ligand-receptor interaction (Langridge et al., 1981; Bohm, 1992a,b). This has proven to be particu-
Ion Binding to Proteins
147
larly useful for the design of new drugs as well as for the prediction of ion-binding sites on proteins. ACKNOWLEDGMENTS This work was supported by grants CA-10925 and CA-06927 from the National Institutes of Health and by an appropriation from the Commonwealth of Pennsylvania. I thank Drs. C. K. Prout, H. L. Carrell, R. J. P. Williams, and P. J. Goodford for many helpful discussions, Amy K. Katz for assistance with the figures, and Oriel College and the Department of Chemical Crystallography in Oxford for their hospitality while I wrote this article.
REFERENCES Ahrland, S., Chatt, J., and Davies, N.R. (1958). The relative affinity of ligand atoms for acceptor molecules and ions. Quart. Rev. 12, 265-276. Al-Karadaghi, S., Cedergren-Zeppezauer, E.S., Hovmoller, S., Petratos, K., Terry, H., and Wilson, K.S. (1994). Refined crystal structure of liver alcohol dehydrogenase-NADH complex at 1.8 A resolution. Acta Cryst. D50, 793-807. Allen, F.H., Bellard, S., Brice, M.D., Cartwright, B.A., Doubleday, A., Higgs, H., Hummelink, T, Hummelink-Peters, G.G., Kennard, O., Motherwell, W.D.S., Rodgers, J.R., and Watson, D.G. (1979). The Cambridge Crystallographic Data Centre: Computer-based search, retrieval, analysis and display of information. Acta Cryst. B35, 2331-2339. Bacon, G.E., and Curry, N.A. (1962). The water molecules in CuS0 4 .5H 2 0. Proc. Roy. Soc. (London) A266, 95-108. Bailey, S., Fairlamb, A.H., and Hunter, W.N. (1994). Structure of trypanothione reductase from Crithidia fasciculata at 2.6 A resolution; enzyme-NADP interactions at 2.8 A resolution. Acta Cryst. B50, 139-154. Beese, L.S., and Steitz, T.A. (1991). Structural basis for the 3,5-exonuclease activity of Escherichia coli DNA polymerase I: a two metal ion mechanism. EMBO J. 10, 15-33. Bernstein, F.C., Koetzle, T.F., Williams, G.J.B., Meyer, E.F., Bryce, M.D., Rogers, J.R., Kennard, O., Shikanouchi, T, and Tasumi, M. (1977). The protein Data Base: a computer-based archival file for macromolecular structures. J. Molec. Biol. 112, 535-542. Bertini, I., Luchinat, C, Rosi, M., Sgamellotti, A., and Tarantelli, F. (1990). pKa of zinc-bound water and nucleophilicity of hydroxo-containing species. Ab initio calculations on models for zinc enzymes. Inorg. Chem. 29, 1460-1463. Blackwell, K.A., Anderson, B.F., and Baker, E.N. (1994). Metal substitution in a blue-copper protein: the crystal structure of cadmium-aurin at 1.8 A resolution. Acta Cryst. D50, 263-270. Blevins, R.A., and Tulinsky, A. (1985). The refinement and the structure of the dimerof oc-chymotrypsin at 1.67 A resolution. J. Biol. Chem. 260, 4264-4275. Bock, C.W., Kaufman, A., and Glusker, J.P. (1994). Coordination of water to magnesium cations. Inorg. Chem. 33,419-427. Bock, C.W., Katz, A.K., and Glusker, J.P. (1995). Hydration of zinc ions: A comparison with magnesium and beryllium ions. J. Amer. Chem. Soc. 117, 3754-3765. Bohm, H.-J. (1992a). The computer program LUDI: A new method for the de novo design of enzyme inhibitors. J. Comput. Aided Molec. Design 6, 61-78. Bohm, H.-J. (1992b). LUDI: Rule-based automatic design of new substituents for enzyme inhibitor leads. J. Comput. Aided Molec. Design 6, 593-606.
148
JENNY P. GLUSKER
Boobbyer, D.N.A., Goodford, P.J., McWhinnie, P.M., and Wade, R.C. (1989). New hydrogen-bond potentials for use in determining energetically favorable binding sites on molecules of known structure. J. Medic. Chem. 32, 1083-1094. Borders, C.L. Jr., Broadwater, J.A., Bekeny, PA., Salmon, J.E., Lee, A.S., Eldridge, A.M., and Pett, V.B. (1994). A structural role for arginine in proteins: multiple hydrogen bonds to backbone oxygens. Protein Sci. 3, 541-548. Borkakoti, N., Winkler, F.K., Williams, D.H., D' Arcy, A., Broadhurst, M.J., Brown, PA., Johnson, W.H., Murray, E.J. (1994). Structure of the catalytic domain of human fibroblast collagenase complexed with an inhibitor. Nature, Structural Biology 1, 106-110. Brown, I.D. and Wu, K.K. (1976). Empirical parameters for calculating cation-oxygen bond valences. Acta Cryst. B32, 1957-1959. Brown, I.D. (1978). Bond valences—A simple structural model for inorganic chemistry. Chem. Soc. Rev 7, 359-376. Brown, I.D. (1988). What factors determine coordination numbers? Acta Cryst. B44, 545-553. Burley, S.K. and Petsko, G.A. (1988). Weakly polar interactions in proteins. Adv. Protein Chem. 39, 125-189. Carrell, A.B., Shimoni, L., Carrell, C.J., Bock, C.W., Murray-Rust, P., and Glusker, J.P (1993). The stereochemistry of the recognition of nitrogen-containing heterocycles by hydrogen bonding and by metal ions. Receptor 3, 57-75. Carrell, C.J., Carrell, H.L., Erlebacher, J., and Glusker, J.P. (1988). Structural aspects of metal ion-carboxylate interactions. J. Amer. Chem. Soc. 110, 8651-8656. Carrell, H.L., Glusker, J.P, Villafranca, J.J., Mildvan, A.S., Dummel, R.J., and Kun, E. (1970). Fluorocitrate inhibition of aconitase: Relative configuration of inhibitory isomer by X-rays. Science 170, 1412-1414. Carrell, H.L., Rubin, B.H., Hurley, T.J., and Glusker, J.P. (1984). X-ray crystal structure of D-xylose isomerase at 4 A resolution. J. Biol. Chem. 259, 3230-3236. Carrell, H.L., Glusker, J.P. Piercy, E.A., Stallings, W.C., Zacharias, D.E., Davis, R.L., Astbury, C, and Kennard, C.H.L. (1987). Metal chelation versus internal hydrogen bonding of the a-hydroxy carboxylate group. J. Amer. Chem. Soc. 109, 8067-8071. Carrell, H.L., Glusker, J.P, Burger, V, Manfre, F, Tritsch, D., and Biellmann, J.-F. (1989). X-ray analysis of D-xylose isomerase at 1.9 A: Native enzyme in complex with substrate and with a mechanismdesigned inactivator. Proc. Natl. Acad. Sci. USA 86, 4440-4444. Carrell, H.L., Hoier, H., and Glusker, J.P. (1994). Modes of binding substrates and their analogues to the enzyme D-xylose isomerase. Acta Cryst. D50, 113-123. Cha, J., Cho, Y, Whitaker, R.D., Carrell, H.L., Glusker, J.P, Karplus, PA., and Batt, C.A. (1994). Perturbing the metal site in D-xylose isomerase. Effects of mutations of His-220 on enzyme stability. J. Biol. Chem. 269, 2687-2694. Chakrabarti, P. (1989). Geometry of interaction of metal ions with sulfur-containing ligands in proteins. Biochemistry 28, 6081-6085. Chakrabarti, P. (1990a). Interaction of metal ions with carboxylic and carboxamide groups in protein structures. Protein Engineering 4, 49-56. Chakrabarti, P. (1990b). Geometry of interaction of metal ions with histidine residues in protein structures. Protein Engineering 4, 57-63. Christianson, D.W. (1991). Structural biology of zinc. Adv. Protein Chem. 42, 281-353. Cody, V. and Murray-Rust, P. (1984). Iodine--X(0,N,S) intermolecular contacts: Models of thyroid hormone-protein binding interactions using information from the Cambridge Crystallographic Data Files. J. Moiec. Struct. 112, 189-199. Coleman, D.E., Lee, E., Mixon, M.B., Linder, M.E., Berghuis, A.M., Gilman, A.G., and Sprang, S.R. (1994). Crystallization and preliminary crystallographic studies of Gial and mutants of Gial in the GTP and GDP-bound states. J. Mol. Biol. 238, 630-634.
Ion Binding to Proteins
149
Einspahr, H. and Bugg, C.E. (1981). The geometry of calcium-carboxylate interactions in crystalline complexes. Acta Cryst. B37, 1044-1052. Emmerich, C, Helliwell, J.R., Redshaw, M., Naismith, J.H., Harrop, S.J., Raftery, J., Kalb (Giboa), A.J., Yariv, J., Dauter, Z., and Wilson, K.S. (1994). High-resolution structures of single-metal-substituted concanavalin A: the Co,Ca-protein at 1.6 A and the Ni,Ca-protein at 2.0 A. Acta Cryst. D50, 749-756. Erlebacher, J., and Carrell, H.L. (1992). ICRVIEW—Graphics program for use on Silicon Graphics computers from the Institute for Cancer Research. Fox Chase Cancer Center, Philadelphia, PA. Forsen, S., Kordel, J., Grundstrom, T., and Chazin, W.J. (1993). The molecular anatomy of a calciumbinding protein. Ace. Chem. Res. 26, 7-14. Frey, CM. and Stuehr, J. (1974). Kinetics of metal ion interactions with nucleotides and base free phosphates. In: Metal Ions in Biological Systems. (Sigel, H., Ed.), Vol. 1, pp. 56-116. Marcel Dekker, New York. Gandour, R.D. (1981). On the importance of orientation in general base catalysis by carboxylate. Biorg. Chem. 10, 169-176. Garmer, D.R. and Krauss, M. (1992). Metal substitution and the active site of carbonic anhydrase. J. Amer. Chem. Soc. 114, 6487-6493. Ghosh, M., Anthony, C, Harlos, K., Goodwin, M.G., and Blake, C. (1995). The refined structure of the quinoprotein methanol dehydrogenase from Methylobacterium extorquens at 1.94 A resolution. Structure 3, 177-187. Gilson, M.K., and Honig, B.H. (1987). Calculation of electrostatic potentials in an enzyme active site. Nature (London) 330, 84-86. Glusker, J.P. and Trueblood, K.N. (1985). Crystal Structure Analysis: A Primer. Second Ed. Oxford University Press, New York. Glusker, J.P. (1991). Structure aspects of metal-liganding to functional groups in proteins. Adv. Protein Chem. 42, 1-73. Glusker, J.P. (1994). Cation-activated enzymes. Encyclopedia of Inorganic Chemistry 2, 598-609. Goodford, P.J. (1985). A computational procedure for determining energetically favorable binding sites on biologically important macromolecules. J. Medic. Chem. 28, 849-857. Gould, R.O., Gray, A.M., Taylor, P., and Walkinshaw, M.D. (1985). Crystal environments and geometries of leucine, isoleucine, valine, and phenylalanine provide estimates of minimum nonbonded contact and preferred van der Waals interaction. J. Amer. Chem. Soc. 107, 5921-5927. Graves, B.J. (1995). Integrin binding revealed. Nature, Structural Biology 2, 181-183. Hakansson, K., Wehnert, A., and Liljas, A. (1994a). X-ray analysis of metal-substituted human carbonic anhydrase II derivatives. Acta Cryst. D50, 93-100. Hakansson, K., Briand, C, Zaitsev, V., Xue, Y, and Liljas, A. (1994b). Wild-type and E106Q mutant carbonic anhydrase complexed with acetate. Acta Cryst. D50, 101-104. Irving, H. and Williams, R.J.P. (1953). The stability of transition-metal complexes. J. Chem. Soc. 3192-3210. Jaffe, E.K. (1993). Predicting the Zn(II) ligands in metalloproteins: Case study, porphobilinogen synthase. Comments Inorg. Chem. 15, 67-92. Jahn, H.A. and Teller, E. (1937). Stability of polyatomic molecules in degenerate electronic states. I. Orbital degeneracy. Proc. Roy. Soc. (London) A161, 220-235. Jeffrey, G. A. (1987). The nanometer world of hydrogen bonds. In: Patterson and Pattersons. Fifty Years of the Patterson Function (Glusker, J.P, Patterson, B.K., and Rossi, M„ Eds.), pp. 193-221. International Union of Crystallography/Oxford University Press, New York. Jenkins, J., Janin, J., Rey, F, Chiadmi, M., van Tilbeurgh, H., Lasters, I., De Maeyer, M., Van Belle, D., Wodak, S. J., Lauwereys, M., Stanssens, P., Mrabet, N.T., Snauwaert, J., Matthyssens, G., and Lambeir, A.-M. (1992). Protein engineering of xylose (glucose) isomerase from Actinoplanes missouriensis. I. Crystallography and site-directed mutagenesis of metal binding sites. Biochemistry 31, 5449-5458.
150
JENNY P. GLUSKER
Jones, T.A. (1978). A graphics model building and refinement system for macromolecules. J. Appl. Cryst. 11,268-272. Kaufman, A., Afshar, C , Rossi, M., Zacharias, D.E., and Glusker, J.P. (1993). Metal ion coordination in cobalt formate dihydrate. Struct. Chem. 4, 191-198. Kim, E.E. and Wyckoff, H.W. (1991). Reaction mechanism of alkaline phosphatase based on crystal structures. Two-metal ion catalysis. J. Molec. Biol. 218, 449-464. Klebe, G. (1994). The use of composite crystal-field environments in molecular recognition and the de novo design of protein ligands. J. Molec. Biol. 237, 212-235. Kretsinger, R.H. and Nockolds, C.E. (1973). Carp muscle calcium-binding protein. II. Structure determinations and general description. J. Biol. Chem. 248, 3313-3326. Kumar, V, Kannan, K.K., and Sathyamurthi, P. (1994). Differences in anionic inhibition of human carbonic anhydrase I revealed from the structures of iodide and gold cyanide inhibitor complexes. Acta Cryst. D50, 731-738. Landro, J.A., Gerlt, J.A., Kozarich, J.W., Koo, C.W., Shah, V.J., Kenyon, G.L., Neidhart, D.J., Fujita, S., and Petsko, G.A. (1994). The role of lysine 166 in the mechanism of mandelate racemase from Pseudomonas putida: Mechanistic and crystallographic evidence for stereospecific alkylation by (/?)-alpha-phenylglycidate. Biochemistry 33, 635-643. Langridge, R., Ferrin, T.E., Kuntz, I.D., and Connolly, M.L. (1981). Real-time color graphics in studies of molecular interactions. Science 211, 661-666. Lauble, H., Kennedy, M.C., Beinert, H., Stout, CD. (1992). Crystal structures of aconitase with isocitrate and nitroisocitrate bound. Biochemistry 31, 2735-2748. Lee, J.O., Rieu, P., Arnaout, M.A., and Liddington, R. (1995). Crystal structure of the A domain from the a subunit of integrin CR3 (CD1 lb/CD 18). Cell 80, 631-638. Mildvan, A.S. (1970). Metals in enzymes catalysis. In: The Enzymes. (Boyer, P.D., Ed.), Third ed., Vol. 2, pp. 446-536. Academic Press, New York. Muirhead, H., Clayden, D.A., Barford, D., Lorimer, C.G., Fothergill-Gilmorc, L.A., Schiltz, E., and Schmitt, W. (1986). The structure of cat muscle pyruvate kinase. EMBO J. 5, 475-481. Murphy, J.E., Xu, X., and Kantrowitz, E.R. (1993). Conversion of a magnesium site into a zinc binding site by a single amino acid substitution in Escherichia coli alkaline phosphatase. J. Biol. Chem. 268,21497-21500. Murray-Rust, P., Stallings, W.C., Monti, C.T., Preston, R.K., and Glusker, J.P. (1983). Intermolecular interactions of the C-F bond: The crystallographic environment of fluorinated carboxylic acids and related structures. J. Amer. Chem. Soc. 105, 3206-3214. Murray-Rust, P. and Glusker, J. (1984). Directional hydrogen bonding to sp - and sp -hybridized oxygen atoms and its relevance to ligand-macromolecule interactions. J. Amer. Chem. Soc. 106, 10181025. Naismith, J.H., Habash, J., Harrop, S., Helliwell, J.R., Hunter, W.N., Wan, T.C.M., Weisgerber, S., Kalb (Gilboa), A.J., and Yariv, J. (1993). Refined structure of cadmium-substituted concanavalin A at 2.0 A resolution. Acta Cryst. D49, 567-571. Nayal, M. and Di Cera, E. (1994). Predicting Ca +-binding sites in proteins. Proc. Natl. Acad. Sci. USA 91,817-821. Neidhart, D.J., Howell, PL., Petsko, G.A., Powers, V.M., Li, R., Kenyon, G.L., and Gerlt, J.A. (1991). Mechanism of the reaction catalyzed by mandelate racemase. 2. Crystal structure of mandelate racemase at 2.5 A resolution: Identification of the active site and possible catalytic residues. Biochemistry 30, 9264-9273. Parraga, G., Horvath, S.J., Eisen, A., Taylor, W.E., Hood, L., Young, E.T., and Klevit, R.E. (1988). Zinc-dependent structure of a single-finger domain of yeast ADR 1. Science 241, 1489-1492. Pascard, C. (1995). Small-molecule crystal structures as a structural basis for drug design. Acta Crystallogr., Sect. D 51, 407-417. Pauling, L. (1929). The principles determining the structure of complex ionic crystals. J. Am. Chem. Soc. 51, 1010-1026.
Ion Binding to Proteins
151
Pavletich, N.P. and Pabo, CO. (1991). Zinc finger-DNA recognition: Crystal structure of a Zif268-DNA complex at 2.1 A resolution. Science 252, 809-817. Pearson, R.G. (1963). Hard and soft acids and bases. J. Am. Chem. Soc. 85, 3533-3539. Pflugrath, J.W. and Quiocho, F.A. (1985). Sulphate sequestered in the sulphate-binding protein of Salmonella typhimurium is bound solely by hydrogen bonds. Nature (London) 314, 257-260. Phillips, C, Gover, S., and Adams, M.J. (1995). Structure of 6-phosphogluconate dehydrogenase refined at 2 A resolution. Acta Crystallogr., Sect. D 51, 290-304. Pirard, B., Baudoux, G., and Durant, F. (1995). A database study of intermolecular N-H—O hydrogen bonds for carboxylates, sulfonates and monohydrogen phosphonates. Acta Crystallogr., Sect. B 51, 103-107. Ramanadham, M, Jakkal, VS., and Chidambaram, R. (1993). Carboxyl group hydrogen bonding in X-ray protein structures analyzed using neutron studies on amino acids. FEBS Lett. 323,203-206. Rao, S.T., Wu, S., Satyshur, K.A., Ling, K.-Y, Kung, C, and Sundaralingam, M. (1993). Structure of Paramecium tetraurelia calmodulin at 1.8 A resolution. Protein Science 2, 436-447. Ries-Kault, M., Ducruix, A., and Van Dorsselaer, A. (1994). Crystallization of previously desalted lysozyme in the presence of sulfate ions. Acta Crystallogr., Sect. D 50, 366-369. Rosenfield, R.E., Parthasarathy, R., and Dunitz, J.D. (1977). Directional preferences of nonbonded atomic contacts with divalent sulfur. I. Electrophiles and nucleophiles. J. Am. Chem. Soc. 99, 4860-4862. Rosenfield, R.E. Jr., Swanson, S.M., Meyer, E.F. Jr., Carrell, H.L., and Murray-Rust, P. (1984). Mapping the atomic environment of functional groups; turning 3D scatter plots into pseudo-density contours. J. Molec. Graphics 2, 43-46. Serpersu, E.H., Shortle, D., and Mildvan, A.S. (1986). Kinetic and magnetic resonance studies of effects of genetic substitution of a Ca +-liganding amino acid in Staphylococcal nuclease. Biochemistry 25, 68-77. Shephard, W.E.B., Kingston, R.L., Anderdson, B.F., and Baker, E.N. (1993). Structure of apo-azurin from Alcaligenes denitrificans at 1.8 A resolution. Acta Crystallogr., Sect. D 49, 331-343. Shimoni, L. and Glusker, J.P. (1995). Hydrogen bonding motifs of protein side chains: Description of binding of arginine and amide groups. Protein Sci. 4, 65-74. Sjolin, L., Tsai, L.-C, Langer, V, Pascher, T., Karlsson, G., Nordling, M., and Nar, H. (1993). Structure of Pseudomonas aeruginosa zinc-azurin mutant Asn47Asp at 2.4 A resolution. Acta Crystallogr., Sect. D 49, 449-457. Sondek, J., Lambright, D.G., Noel, J.P., Hamm, H.E., and Sigler, P.B. (1994). GTPase mechanism of G proteins from the 1.7-A crystal structure of transducin (X-GDPAIFJ. Nature (London) 372, 276-279. Sowadski, J.M., Handschumacher, M.D., Murthy, H.M.K., Kundrot, C, and Wyckoff, H.W. (1983). Crystallographic observations of the metal ion triple in the active site region of alkaline phosphatase. J. Molec. Biol. 170, 575-581. Sternberg, M.J.E., Hayes, F.R.F., Russell, A.J., Thomas, P.G., and Fersht, A.R. (1987). Prediction of electrostatic effects of engineering protein charges. Nature 330, 86-88. Strynadka, N.C.J, and James, M.N.G. (1989). Crystal structures of the helix-loop-helix calcium-binding proteins. Annu. Rev. Biochem. 58, 951-958. Strynadka, N.C.J, and James, M.N.G. (1994). Calcium-binding proteins. Encyclopedia of Inorganic Chemistry 2, 477-507. Sudfeldt, C, Schaffer, A., Kagi, J.H.R., Bogumil, R., Schulz, H.-P, Wulff, S., and Witzel, H. (1990). Spectroscopic studies on the metal-ion-binding sites of Co +-substituted D-xylose isomerase from Streptomyces rubiginosus. Eur. J. Biochem. 193, 863-871. Sutor, D.J. (1963). Evidence for the existence of C-H—0 hydrogen bonds in crystals. J. Chem. Soc. 1105-1110. Swindells, M.B. (1994). Coordination of acetate with the di-iron centre of methane monooxygenase. Nature, Structural Biology 1, 81-82.
152
JENNY P. GLUSKER
Tainer, J.A., Getzoff, E.D., Beem, K.M, Richardson, J.S., and Richardson, D.C. (1982). Determination and analysis of the 2 A structure of copper, zinc superoxide dismutase. J. Molec. Biol. 160, 181-217. Taylor, R. and Kennard, O. (1984). Hydrogen bond geometry in organic crystals. Ace. Chem. Res. 17, 320-326. Taylor, R. (1995). Applications of crystal field environments in molecular design. Abstract CL4. BCA-BACG Joint Spring Meeting, University of Cardiff, March 27-31. Trewhella, J., Liddle, W.K., Heidorn, D.B., and Strynadka, N. (1989). Calmodulin and troponin C structures studies by Fourier transform infrared spectroscopy: Effects of Ca + and Mg + binding. Biochemistry 28, 1294-1301. Tsukada, H. and Blow, D.M. (1985). Structure of a-chymotrypsin refined at 1.68 A resolution. J. Molec. Biol. 184,703-711. Umeyama, H. and Morokuma, K. (1977). The origin of hydrogen bonding. An energy decomposition study. J. Am. Chem. Soc. 99, 1316-1332. Vallee, B.L. and Auld, D.S. (1990). Zinc coordination, function, and structure of zinc enzymes and other proteins. Biochemistry 29, 5647-5659. van Tilbeurgh, H., Jenkins, J., Chiadmi, M., Janin, J., Wodak, S.J., Mrabet, N.T., and Lambeir, A.-M. (1992). Protein engineering of xylose (glucose) isomerase from Actinoplanes missouriensis. 3. Changing metal specificity and pH profile by site-directed mutagenesis. Biochemistry 31, 54675471. Vedani, A. and Huhta, D.W. (1990). A new force field for modelling metalloproteins. J. Am. Chem. Soc. 112,4759-4767. Wade, R.C., Clark, K.J., and Goodford, P.J. (1993). Further development of hydrogen bond functions for use in determining energetically favorable binding sites on molecules of known structure. 1. Ligand probe groups with the ability to form two hydrogen bonds. J. Medic Chem. 36, 140-147. Wade, R.C. and Goodford, P.J. (1993). Further development of hydrogen bond functions for use in determining energetically favorable binding sites on molecules of known structure. 2. Ligand probe groups with the ability to form more than two hydrogen bonds. J. Medic. Chem. 36,148-156. Watenpaugh, K.D., Sieker, L.C., and Jensen, L.H. (1980). Crystallographic refinement of rubredoxin at 1.2 A resolution. J. Molec. Biol. 138, 615-633. Whitlow, M., Howard, A.J., Finzel, B.C., Poulos, T.L., Winborne, E., and Gilliland, G.L. (1991). A metal-mediated hydride shift mechanism for xylose isomerase based on the 1.6 A Streptomyces rubiginosus structure with xylitol and D-xylose. Proteins: Structure, Function, and Genetics 9, 153-173. Xia, Z.-X., Dai, W.-W., Zhang, Y.-F, White, S.A., Boyd, G.D., and Mathews, FS. (1996). Determination of the gene sequence and the three-dimensional structure at 2.4 A resolution of methanol dehydrogenase from Methylophilus W3A1. J. Molec. Biol. 259, 480-501. Yamashita, M.M., Wesson, L., Eisenman, G., and Eisenberg, D. (1990). Where metal ions bind proteins. Proc. Natl. Acad. Sci. USA 87, 5648-5652. Zahrobsky, R.F. and Bauer, W.H. (1968). On the crystal chemistry of salt hydrates. V. The determination of the crystal structure of CuS04-3H20 (bonattite). Acta Crystallogr., Sect. B 24, 508-513.
Chapter 5
Protein Folding Franz X. Schmid
Abstract Introduction The Cooperativity of Protein Folding The Two-State Approximation The Kinetics of Two-State Folding Transitions Prolyl Peptide Bond Isomerizations as a Source for Complex Kinetics Fast- and Slow-Folding Reactions Prolyl Peptide Bond Isomerization in Protein Folding Isomerization of Peptide Bonds not Preceding Proline Catalysis by Prolyl Isomerase Two-State Folding Reactions The Chymotrypsin Inhibitor CI2 The Cold-Shock Protein CspB Other Small and Fast-Folding Proteins Folding Intermediates Equilibrium Intermediates: The Molten Globule Kinetic Intermediates and Their Relation with Equilibrium Intermediates Characterization of Kinetic Folding Intermediates by Protein Engineering The Role of Intermediates for Folding
Protein: A Comprehensive Treatise Volume 2, pages 153-215 Copyright © 1999 by JAI Press Inc. All rights of reproduction in any form reserved. ISBN: 1-55938-672-X
153
154 154 156 156 157 160 160 161 163 163 165 165 166 169 170 171 173 174 175
154
FRANZ X. SCHMID
The Rate-Limiting Events in Protein Folding Analysis of the Properties of Activated States of Folding Intermediates and Activated States New Kinetic Methods to Follow Folding Events in the Submilliseconds Range Folding of Disulfide-Containing Proteins Folding of Two-Domain Proteins The Immunoglobulin Light Chain Yn Crystallin Folding and Association of Oligomeric Proteins Folding of Large Monomeric and Oligomeric Proteins Large Monomeric Proteins Large Oligomeric Proteins Concluding Remarks
176 178 186 187 188 190 192 194 197 199 199 202 203
ABSTRACT The information for the folding of a protein chain to its unique three-dimensional structure is encoded in its amino acid sequence. The folded conformation is only marginally more stable than the unfolded forms, but nevertheless proteins can find the native conformation in a fast and efficient manner. The principles of protein folding are still not understood at the molecular level. Proteins differ strongly in size and structure and probably use diverse strategies for folding. This chapter gives an overview about the folding of very small single-domain proteins as well as of large multidomain and multisubunit proteins. The former illustrate the problems of chain folding at the elementary level, while the latter are used to discuss additional steps and problems, such as the correct assembly of preformed folding modules and the avoidance of nonproductive interactions in folding.
INTRODUCTION Proteins are synthesized as linear polypeptide chains at the ribosome. The information in the amino acid sequence codes not only for the function of a protein but also for the efficient folding to the well-ordered, three-dimensional conformation of the native state. This encoding of the folding information in the sequence has been referred to as "the second half of the genetic code" (Goldberg, 1985). The biologically active conformation of a protein must not only be well ordered but also highly dynamic. Proteins achieve this combination of order and dynamics because the native state is only marginally more stable than the unfolded one. So, one of the major questions in protein folding is, How can it be that a protein chain finds its native state so fast and efficiently despite the astronomic number of unfolded conformations and despite the marginal stability of the native conformation? Clearly, folding cannot occur by trial and error because a random search would be very slow, even for short polypeptide chains (Levinthal, 1968). Many proteins,
Protein Folding
155
however, fold extremely fast and efficiently. Several small proteins can complete folding within a few milliseconds and many more reach their native conformations within a second. These times are much faster than protein biosynthesis at the ribosome. High folding rates were traditionally explained by assuming ordered folding pathways (Kim and Baldwin, 1982; Kim and Baldwin, 1990) in which the formation of partially folded intermediates rapidly restricts the multitude of unfolded conformations to a few productive ones. There are also proteins that fold very slowly. They require several hours to reach the native state, although partially folded forms are populated in these reactions as well. This led to the speculation that partially folded forms might not always be productive for folding but that their formation could also retard folding or even navigate the folding chains into kinetic traps (Creighton, 1994; Creighton et al., 1996; Baldwin, 1996). Such traps can be deleterious for folding because the risk of unspecific aggregation increases sharply when incompletely folded forms with exposed hydrophobic patches are long-lived. The de-novo folding of nascent protein chains is probably also plagued by kinetic traps and aggregation (King et al., 1996). In the cells, chaperones bind to such partially structured forms, prevent their aggregation, and reroute them to a productive folding pathway. The mechanisms of chaperone-mediated folding are not yet well understood (Hartl, 1996; Todd et al., 1996; Weissman et al., 1995; Fenton and Horwich, 1997). To understand the folding mechanism of a protein, it is necessary to study its kinetics and to establish reaction schemes for unfolding and refolding. Kinetically detected folding intermediates and activated states between them have to be characterized with regard to their structure, their stability, and the position on the folding pathway. In principle, a folding pathway is understood when all intermediates and activated states are arranged in their correct order and when their structures are known. This is very difficult because folding intermediates are formed only transiently for a short period of time. Only intermediates that are followed by slow steps accumulate and can be studied. Intermediates that do not accumulate can still be very important for folding, but they are easily overlooked. The kinetics of protein folding are further complicated because the unfolded state of most proteins is heterogeneous conformationally and gives rise to multiple parallel refolding reactions (Garel and Baldwin, 1973; Brandts et al., 1975; Schmid, 1983, 1992). These parallel reactions can merge at certain stages of folding or remain separated until the native state is reached. It has been discussed for a long time as to whether the native structure of a protein is under thermodynamic or kinetic control (Wetlaufer and Ristow, 1973; Sinclair et al., 1994), that is, whether the native protein is the state of lowest free energy or the state that is kinetically formed most rapidly. For many small proteins, the evidence for thermodynamic control is very clear and convincing: the unfolded and the native state exist in thermodynamic equilibrium within the unfolding transition, and this native state is, by several criteria, the same as the one found under physiological conditions. On the other hand, kinetic control should be important to ensure that
156
FRANZ X. SCHMID
the native state is reached in a short time. Both aspects could be satisfied by the evolution of folding mechanisms in which high energy barriers are absent in the early stages of folding to avoid a premature kinetic partitioning to trapped, nonnative conformations. The use of kinetic control to reach the state of lowest free energy rapidly could be termed a "kinetic consistency principle" of folding [used in analogy to the consistency principle (Go, 1983) of short-range and long-range interactions in protein folding]. According to this principle, kinetic pathways of folding are consistent with, and lead to, the state of lowest free energy. Rapid folding and kinetic constraints are probably not always sufficient to avoid aberrant folding along nonproductive routes that finally lead to aggregation. The de-novo folding of many proteins in the cell involves transient interactions with chaperone systems (Goloubinoff et al., 1989; Hartl, 1996; Frydman and Hartl, 1996; Lorimer, 1996; Buchner, 1996; Fenton and Horwich, 1997). Of course, chaperones cannot not carry specific information to direct the folding of a protein. Rather, they help refolding chains to avoid side reactions such as aggregation (Buchner et al., 1991). This chapter concentrates on the mechanisms of unfolding and refolding in vitro first of small monomeric proteins and then of large multidomain and multisubunit proteins. Protein folding has been the subject of several reviews and monographs. An excellent discussion of the fundamental aspects of protein folding and of the interrelationship between thermodynamic and kinetic aspects is provided by the three classic reviews by Tanford (1968a; 1968b; 1970). The conceptual framework developed in these articles is still worth reading. The articles by Jaenicke (1979; 1987; 1996a) and Creighton (1990) and the monographs edited by Creighton (1992a) and by Pain (1994) provide general overviews on folding. The mechanism of folding, intermediates, and catalysis of folding were reviewed by Matthews (Matthews, 1993; Lecomte and Matthews, 1993), Dobson (Dobson, 1995; Miranker and Dobson, 1996), and Schmid (1992). Small proteins and the role of folding intermediates are the subject of two reviews by Kim and Baldwin (1982; 1990). A collection of papers on practical aspects of protein folding kinetics can be found in (Hirs and Timasheff, 1986; Shirley, 1995). A yearly update on new developments in the field of protein folding is found in every January issue of the Current Opinion in Structural Biology series.
THE COOPERATIVITY OF PROTEIN FOLDING The Two-State Approximation
The folding of small proteins is often a cooperative process, and only folded (N) and unfolded (U) molecules, but not partially folded intermediates, are populated at equilibrium. The thermal and the denaturant-induced unfolding transitions of these proteins are thus well described by a N ^ U two-state model, which involves only U and N. Such two-state protein folding transitions are a consequence of the
Protein Folding
157
cooperative nature of protein stabilization. The low overall conformational stability of the folded molecules originates from a fine balance between many individual contributions. The loss of a few of these interactions can tip this balance and thus partially folded conformations with less stabilizing interactions are unstable relative to N or U in a folding transition. It is possible to test whether a protein folding reaction obeys the two-state approximation. In the simplest test the transition is measured by different probes such as fluorescence, absorbance, circular dichroism, or viscosity. For a two-state reaction, coincident transitions should be observed. A stringent criterion is provided by the microcalorimetric measurement of the thermal unfolding transition. Both the calorimetric enthalpy of unfolding A//cal and the corresponding van't Hoff enthalpy A//v H are obtained from a single experiment. A//cal is determined directly from the heat absorption measured in the calorimeter cell, and A// vH is calculated from the temperature dependence of the equilibrium constant under the assumption that the two-state model is valid. Identical values for A//cal and A//v H prove that a particular unfolding reaction is a two-state process. Values close to one for the ratio A//cal/A//v H have indeed been found for a number of small proteins (Privalov, 1979, 1982, 1996; Privalov and Gill, 1988). A//cal/A//vH values that are larger than 1, are found when intermediates are populated in a folding transition whereas values smaller than 1 (such as 0.5) suggest that unfolding is coupled with a dissociation reaction, as in the dissociation and unfolding of a dimeric protein (Privalov and Gill, 1988). Intermediates that do not differ from either N or U in enthalpy and heat capacity are not detected by the calorimetric test. The respective transitions (N ^ I or I ^ U) would be insensitive to temperature variation and thus be "silent" in the calorimetric experiments. The Kinetics of Two-State Folding Transitions
For two-state protein-folding reactions, the kinetics and the equilibrium transition (cf. Equation 1) are correlated by a set of simple thermodynamic and kinetic relations, which are described by Equations 1-6 and illustrated in Figure 1. They were first described by Tanford and coworkers (Tanford, 1968b; Ikai and Tanford, 1973; Tanford et al., 1973). A two-state transition curve for a denaturant-induced unfolding transition is sketched in Figure 1 A. This reaction follows the mechanism in Equation 1. Folding and unfolding are monophasic, first-order reactions under all conditions. The measured apparent rate constant X is equal to the sum of the microscopic rate constants of unfolding and refolding £NU and kVN (Equation 2), and the equilibrium constant for denaturation KD is equal to the ratio of kVN and kNU (Equation 3). *NU
N
^ U
(1)
FRANZ X. SCHMID
158
_
T
T-,
7
[\
-T
--
i -
i • • i
i—i
B
P h_
\ ^\
~i
4 J H
r
i 1
100
p r P 1
i
i
I
1
T\
1
1
1
|
1
|
1
i
i
»- i
c \ v ^ ^
10 1
i
"A
1
m
1
3
"NU
0.1 Q__L 0
%N
L ....i,—x-
1
N^
i. £\ i
i
2 3 [Denaturant] (M)
4
i
1 ~J 5
Figure 1. Equilibrium and kinetics of a denaturant-induced two-state folding reaction. (A) Sketch of the equilibrium unfolding transition. (B) ln/CD as a function of the denaturant concentration. KQ is defined as in Equation 2. The slope in this plot is mD (= dln/Co/dlD]) (C) Plot of the measured rate constant X (continuous line) and the microscopic rate constants of unfolding (/cNU, dotted line) and refolding (kUN, dashed line) as a function of [D]. The slopes of these lines are the kinetic m values ( m N U and m U N , respectively). The chevron for X goes through a minimum near the midpoint of the transition, where /e NU = kUN.
"- ~ *NU
+
*UN
K D = [ U ] / [ N ] = * NU /*UN
(2) (3)
The Gibbs free energy of stabilization and thus \nKD are assumed to depend linearly on the denaturant concentration (Equation 4 and Figure IB). \nKD = lnATD(H20) + mD • [D]
(4)
In Equation 4, [D] is the concentration of denaturant, KD(H20) is the equilibrium constant of denaturation in the absence of the denaturant, and the cooperativity
159
Protein Folding
parameter raD (= dln^/dfD]) reflects the denaturant dependence of \nKD (and thus of the Gibbs free energy of denaturation). The physical origin of the linear relationship in Equation 4 is not clear, but nevertheless it describes the stability curves of many proteins very well. Since KD is equal to the ratio of the rate constants of unfolding and refolding (Equation 3), it is generally assumed that ln£NU and ln&UN also depend linearly on the denaturant concentration as described by Equations 5 and 6 and as shown in Figure 1C. ln*NU = ln*NU(H20) + m NU [D]
(5)
ln£UN = ln£UN(H20) + m UN [D]
(6)
In equations 5 and 6, &NU(H20) and &UN(H20) are the microscopic rate constants of unfolding and refolding in the absence of urea, and mNU and mUN are their dependences on the denaturant concentration ([D]). Equations 5 and 6 can be combined with Equation 2 to describe the dependence on denaturant concentration of X, the measured rate constant of folding (cf. Figure 1C). Two-state folding is, under all conditions, a monoexponential reaction, which is governed by the apparent rate constant X. In the native baseline region, refolding is much faster than unfolding (kVN » km) and therefore X is approximately equal to &UN (Figure 1C). The rate of folding decreases when the transition is approached, and at its midpoint (where KD= I) kNU is equal to &UN and both contribute equally to X (cf. Equation 2). Folding is usually slowest in this region (Figure 1C). When the concentration of the denaturant is increased beyond the transition, (un)folding becomes usually faster again because the rate constant of unfolding kNV increases. X approximates and follows kNV under unfolding conditions at high denaturant concentration (cf. Figure 1C). The cooperativity parameter mD (cf. Equation 4 and Figure IB) depends on the differential interactions of the native and the unfolded protein with the denaturant (Schellman, 1978; Pace, 1986). In the binding model, mD reflects the increase in the number of binding sites for denaturant molecules that become exposed upon unfolding, and thus the m values report on the increase of accessible surface area during unfolding (Myers et al., 1995). From the analogy of Equations 4, 5, and 6, it is apparent that, similarly to the equilibrium value mD, the kinetic m values (mNU and mUN) report on the changes in the interactions with the denaturant (and thus on the changes in accessible surface area) between the unfolded state and the activated state (wUN) and between the native state and the activated state (mN{J). The equilibrium m value should be equal to the difference of the kinetic m values (Equation 7). m
D =
m
UN~mNV
(7)
FRANZ X. SCHMID
160
For a two-state folding reaction, the observed kinetics are independent of the probe used to follow the reaction. When spectral changes are employed to monitor folding, the kinetics are independent of the wavelength, and at isosbestic, isoemissive, or isodichroic points, no signal changes should occur during the entire time course of folding and under varying final conditions. This criterion has already been pointed out by Tanford (1968b) as a good indicator for the absence of transiently populated intermediates. All these properties are very helpful to evaluate the two-state character of a folding reaction, but they do not rigorously rule out that intermediates are present. Indeed, apparent two-state kinetics were found within the transition region for a number of proteins that slowly equilibrate in the unfolded state and/or fold via structural intermediates, which strongly resemble the unfolded or the native protein or which are not well populated [see the discussions by Tanford (1968b) and by Utiyama and Baldwin (1986)].
PROLYL PEPTIDE BOND ISOMERIZATIONS AS A SOURCE FOR COMPLEX KINETICS A single kinetic phase with a single rate constant implies that all the molecules are following the same pathway with the same rate constant. A unique aspect of protein folding is the conformational heterogeneity of the unfolded state. For molecules to fold by the same pathway, all conformational transitions in the unfolded state must be very rapid relative to the overall rate of folding. Perhaps not surprisingly, in hindsight, this generally is not the case. In particular, there are intrinsically slow isomerization processes in unfolded proteins that prevent the molecules from rapidly equilibrating conformationally. Fast- and Slow-Folding Reactions
When fast mixing techniques (such as stopped-flow spectroscopy) were used to investigate the kinetics of protein folding, it became clear that for many proteins a fraction of the molecules could reach the native state within a second or less, whereas the remaining molecules folded much more slowly, often in the range of several minutes (Ikai and Tanford, 1973;Ikaietal., 1973;Tsongetal., 1971;Tsong et al., 1972; Garel and Baldwin, 1973; Nail et al., 1978; Hagerman and Baldwin, 1976). Fast and slow phases were most readily found in refolding experiments ending within the native baseline region. Unfolding under denaturing conditions was usually a monophasic process. Initially, mechanisms were suggested that involved cooperative sequential steps (Tsong et al., 1972) or off-pathway intermediates (Ikai and Tanford, 1973) in order to explain the kinetic data and their dependence on the denaturant concentration. The key observation by Garel and Baldwin (1973) that both fast and slow phases of RNase A folding produced enzymatically active protein led to the suggestion that the complex kinetics were
Protein Folding
161
caused by the co-existence of fast-folding (UF) and of slow-folding (U s ) species in the unfolded state of this protein (Equation 8). slow
fast
Prolyl Peptide Bond Isomerization in Protein Folding
The existence of slow equilibration reactions in unfolded protein chains that create a mixture of UF and U s species was a surprise in 1973. A plausible molecular explanation was provided by the suggestion by Brandts and coworkers (Brandts et al., 1975) that the U s molecules refold slowly because they contain incorrect isomers of Xaa-Pro peptide bonds. Peptide bonds are planar and can be either in the trans or in the cis conformation. Peptide bonds not involving proline residues are generally in the trans state; the cis conformation has not been detected in unstructured, linear oligopeptides, and at equilibrium, less than 0.1% of all molecules are believed to be in the cis form (Brandts et al., 1975). In fact, very few nonproline cis peptide bonds have been found in native proteins by X-ray crystallography (Stewart et al., 1990; Macarthur and Thornton, 1991). Unlike other peptide bonds, those between proline and its preceding amino acid (Xaa-Pro bonds; Figure 2) typically exist as a mixture of cis and trans isomers in solution unless structural constraints such as in folded proteins stabilize one of the two isomers. In the absence of ordered structure and in short linear peptides, the trans isomer is usually favored slightly over the cis. A cis content of 10-30% is frequently found (Cheng and Bovey, 1977; Grathwohl and Wuthrich, 1981). It depends primarily on the chemical nature of the flanking amino acids and on the charge distribution around the Xaa-Pro bond. The cis —»trans isomerization is an intrinsically slow reaction (time constants of 10-100 s are observed at 25°C) with a high activation energy (EA = 85 kJ/mol) because it involves the rotation about a partial double bond. Approximately 7% of all prolyl peptide bonds in native proteins of known three-dimensional structure are cis (Stewart et al., 1990; Macarthur and Thornton, 1991). The conformational state of each peptide bond is usually well-defined, being either cis or trans in every molecule depending on the structural constraints imposed by the chain folding. Rare exceptions exist: cis/trans equilibria at particular Xaa-Pro bonds have been detected in native staphylococcal nuclease (Evans et al., 1987) and in calbindin (Chazin et al., 1989). In most cases, the native protein, N, has each prolyl peptide bond in a unique conformation, either cis or trans. After unfolding (N —> UF), however, these bonds become free to isomerize slowly as in small oligopeptides thus leading to an
FRANZ X. S C H M I D
162
V
-—uvtx
/
O
Ca+i
trans ' (60-90%)
V
v
X
-Ca
A
\U C
o
cis (10-40%)
Figure 2. Alternative isomeric states of a peptidyl prolyl bond and its cis ^ isomerization.
trans
equilibrium mixture of molecules with different prolyl isomers. The chains with the correct set of isomers usually refold rapidly; these are the UF molecules. Chains with at least one incorrect proline (the U s molecules) refold more slowly. The reisomerizations of the wrong prolyl bonds are slow steps in folding. Nonnative isomers, however, do not necessarily block refolding, and reisomerization is not required to be the first step of folding as suggested initially. Chains with certain incorrect isomers can rapidly form ordered nativelike structure prior to prolyl peptide bond reisomerization (Cook et al., 1979; Schmid and Blaschek, 1981; Goto and Hamaguchi, 1982b; Kelley and Stellwagen, 1984; Kelley and Richards, 1987; Kiefhaber et al., 1990b; Nail, 1985) The fraction of U s molecules depends on the number of proline residues and on their isomeric state in the native protein. In particular, the presence of cis prolyl peptide bonds in the folded molecules leads to a high fraction of U s , because the cis state is usually less populated in the absence of structural constraints. In lysozyme (Kato et al., 1982), cytochrome c (Ridge et al., 1981; Ramdas and Nail, 1986; White et al., 1987; Wood et al., 1988; Nail, 1985), bovine pancreatic trypsin inhibitor (Hurle et al., 1991), barnase (Matouschek et al., 1990), barstar (Schreiber and Fersht, 1993), or the chymotrypsin inhibitor CI2 (Jackson and Fersht, 1991a), which have only trans prolyl peptide bonds, the U F molecules dominate in the unfolded protein. In RNase A (Garel and Baldwin, 1973; Houry et al., 1994; Houry et al., 1995) and RNase Tl (Kiefhaber et al., 1990a,b,c), both of which have two cis prolyl peptide bonds, the fraction of UF is reduced to 0.04. An immunoglobulin fragment with a single cis prolyl peptide bond displays only 20% U F (Goto and Hamaguchi, 1982b). In the refolding of the U s molecules, prolyl peptide bond isomerization is coupled with conformational folding. The presence of incorrect isomers in the chain decelerates folding steps, and the equilibrium and kinetic properties of Xaa-Pro peptide bond isomerization are changed by the preceding folding steps. Prolyl isomerization is well-established as a slow reaction in various unfolded proteins. Nevertheless, other potential sources for slow interconversion reactions in unfolded protein molecules should not be dismissed. Slow ligand exchange (Sosnick et al., 1994; Sosnick et al., 1996) or loop threading reactions of disulfide-
Protein Folding
163
bonded chains (Nail et al., 1978) have been discussed as possible sources for slow reactions. Isomerization of Peptide Bonds not Preceding Proline
For normal peptide bonds (not preceding proline), the trans state is favored at least 100-fold over cis (Ramachandran and Mitra, 1976; Jorgensen and Gao, 1988), which implies that, in unfolded proteins, an average of one incorrect cis peptide bond occurs in every 100 residues. In addition, several cis peptide bonds not preceding proline have been found in native proteins by X-ray crystallography, most often near active sites, where they might be important for the functions of these proteins. The trans ^ cis isomerizations of such peptide bonds could thus also occur in protein folding. However, until recently no protein folding reactions were known which are limited in rate by the isomerization of normal peptide bonds. In RNase Tl, a cis Tyr-Ala bond could be generated by the replacement of cis-?xo39 by an alanine (Mayr et al., 1993, 1994b; Mayr and Schmid, 1993). This mutation causes a major change in the folding mechanism of this protein, and the very slow trans —> cis isomerization of the Tyr38-Ala39 peptide bond now determines the refolding rate of the Pro39Ala variant (Odefey et al., 1995). The time constant of this reaction is about 500 s at 25 °C. A slow nonprolyl peptide bond isomerization was also detected in the refolding of the Prol67Thr variant of P-lactamase (Vanhove et al., 1995, 1996). Because the trans state of a normal peptide bond is at least 100-fold higher populated than the cis state, the cis —> trans reaction is at least 100-fold faster than the trans —» cis reaction. This suggests that the cis —> trans isomerization of the very few peptide bonds, which on the average might be cis in an unfolded protein, should occur in less than a second. In contrast, the reverse trans —> cis isomerization is very slow and, as in the case of the Pro39Ala variant of RNase Tl, should be a rate-limiting step in the folding of the proteins that contain cis peptide bonds (Herzberg and Moult, 1991) in their native conformations. Catalysis by Prolyl Isomerase
Around 1978, it became clear that prolyl isomerization retards the folding of many proteins (Schmid, 1993) and, as a consequence, several laboratories began to search for an enzyme that could catalyze prolyl isomerization. In 1984, Fischer and his coworkers (Fischer et al., 1984) devised an ingenious assay for such an activity and isolated the first prolyl isomerase (from porcine kidney; now known as cytoplasmic cyclophilinl8). In vitro, this enzyme accelerated the proline-limited folding reactions of many proteins, and the extent of catalysis was good when the prolines remained accessible for the isomerase during folding (Schmid, 1993; Schmid et al., 1993). The porcine prolyl isomerase was found to be identical with cyclophilin (Cypl 8), the cytosolic binding protein for the immunosuppressive drug cyclosporin A (CsA).
164
FRANZ X. SCHMID
As a matter of fact, this protein was discovered independently twice in 1984—once by virtue of its tight binding to CsA (Handschumacher et al., 1984) and once by its catalytic activity as a prolyl isomerase (Fischer et al., 1984). It took until 1989 to recognize that procine prolyl isomerase and cyclophilin were identical proteins (Fischer et al., 1989; Takahashi et al., 1989). The binding protein for another immunosuppressant, FK 506, is also a prolyl isomerase and catalyzes protein folding in vitro. This FK 506 binding protein (FKBP12) is unrelated to the cyclophilins. Cyclophilins and FKBPs are found in virtually all species and all cellular compartments, and they can also function as domains of larger proteins such as in FKBP52 or in Cyp40 (Fischer, 1994; Galat and Metcalfe, 1995). Recently discovered were novel prolyl isomerases (the parvulins), which are unrelated with the cyclophilins and the FKBPs and which cannot be inhibited by immunosuppressants (Rahfeld et al., 1994a,b). The functions of Cypl8 and FKBP12 in immunosuppression neither involve their prolyl isomerase activity nor the catalysis of a protein-folding reaction. Rather these proteins are recruited by CsA and FK 506, respectively, for association in tight complexes. These complexes bind to calcineurin and thus inhibit the transport to the nucleus of a cytosolic component of the transcription factor NF-AT in T-cells (Fischer, 1994; Galat and Metcalfe, 1995). Good evidence that the prolyl isomerase activities of cyclophilins are involved in cellular folding comes from studies on the folding of collagen (Steinmann et al., 1991) and on the folding of mitochondrial proteins (Rassow et al., 1995; Matouschek et al., 1995b). Proteins that are targeted to the mitochondrial matrix have to be unfolded in order to cross the two membranes and then they must refold in the matrix. The refolding of an artificial precursor protein composed of the presequence of subunit 9 of the Neurospora crassa FjF0-ATPase fused to mouse cytosolic dihydrofolate reductase (Su9-DHFR) was two- to threefold retarded in mitochondria that were derived from yeast and N. crassa mutants that lacked a functional mitochondrial cyclophilin. The kinetics of DHFR refolding in these mutant mitochondria were almost identical with the refolding kinetics in the presence of CsA in the wild-type mitochondria. These results provide very strong evidence that the mitochondrial cyclophilin acts as a prolyl isomerase and thus can catalyze cellular protein folding. The functions of the huge number of prolyl isomerases are certainly not restricted to de-novo protein folding. The link with immunosuppression, the interaction of a host cyclophilin with the Gag polyprotein of HIV-1 (Franke et al., 1994; Yoo et al., 1997; Zhao et al., 1997), and the stabilization of a calcium release channel by a FK506 binding protein (Timerman et al., 1995) point to additional functions [see (Fischer, 1994) and (Galat and Metcalfe, 1995) for reviews]. Unlike the catalysis of folding, these functions seem to involve tight binding to the target molecules. Whether such a binding can change the cisltrans isomerism of prolines at the protein surface is not known yet. Molecular switching by prolyl cisltrans isomerization should be possible, but concise experimental evidence is still lacking.
Protein Folding
165
TWO-STATE FOLDING REACTIONS The two-state approximation is very useful because, as outlined in Equations 1-6, it leads to a set of simple relations to derive stability data, to analyze the folding kinetics, and to relate and cross-validate the thermodynamic and kinetic data. Initially, several protein folding reactions seemed to follow a two-state mechanism in their folding, but when these reactions were reinvestigated by using stopped-flow methods to reduce the dead time of mixing, additional phases were discovered in their folding and, by the early seventies, it was generally assumed that protein-folding is a complex process that involves partially folded intermediates and multiple pathways (Kim and Baldwin, 1982; Kim and Baldwin, 1990). At the same time, the sensitivity of scanning microcalorimetry was greatly improved and it became clear that, despite the complexity of the kinetics, the thermal unfolding transitions of many small proteins could be rigorously described by a two-state mechanism (Privalov, 1979). This apparent discrepancy between the thermodynamic and the kinetic data was tentatively explained by the kinetic heterogeneity of the unfolded state caused by prolyl isomerizations (as described above) and by the assumption that partially folded intermediates tend to be unstable in the transition region in which the stability data are collected. The Chymotrypsin Inhibitor CI2
The first clear-cut example of a protein folding reaction that followed the two-state approximation not only in its thermodynamics, but also largely in its kinetics, was provided by the chymotrypsin inhibitor CI2 from barley (Jackson and Fersht, 1991a,b). At first glance, the refolding kinetics of CI2 were also complex and consisted of a major fast process (77% of the total amplitude) and two minor slow processes (with together 23% of the total amplitude). They were accelerated when a prolyl isomerase was added and could thus be assigned to the reisomerizations of the five trans prolyl peptide bonds of this protein. The major fast-folding reaction was assumed to originate from the molecules with correct prolyl isomers. This reaction shows a time constant of 20 ms at 0 M GdmCl and, at the midpoint of the unfolding transition (near 4 M GdmCl), the folded and the unfolded molecules equilibrate with a time constant of about 200 s (Jackson and Fersht, 1991b). In the entire range of denaturant concentrations (0-8 M GdmCl), the logarithms of the rate constants of unfolding and of refolding depended linearly on the GdmCl concentration as expected for a two-state reaction (cf. Equations 5 and 6). Thus, as shown in Figure 3, the measured rate constant A, follows a V-shaped profile (which is usually called a "chevron") with linear limbs at low and at high GdmCl concentration. Moreover, the values for the equilibrium constant KD calculated from the ratio of the rate constants of unfolding and refolding agreed very well with the KD values obtained from the equilibrium transition and the difference of the kinetic m values was identical with the corresponding equilib-
166
FRANZ X. SCHMID
(GdnHCI)
(M)
Figure 3. Folding kinetics of the chymotrypsin inhibitor CI2. The natural logarithm of the measured rate constant oi folding (k) is shown as a function of the guanidinium chloride concentration ([GdnHCI]). The kinetics were measured at 25 °C in 50 m M MES buffer, pH 6.3. The points between 0 and 0.59 M GdnHCI were obtained by pH-jump experiments (from pH 1.7, where CI2 is unfolded in the absence of denaturant), the other points from [GdnHCI] jump experiments. The solid curve is the best fit of the data to a two-state model, m values of - 1 . 9 0 M~ and 1.24 M _ 1 were obtained for refolding and for unfolding, respectively, from this fit. The insert shows the [urea] dependence of the measured rate constants for denaturation and renaturation for wild-type barnase. The solid curve is that calculated for a two-state system. The figure is reprinted with permission from Jackson and Fersht (1991b). Copyright [1991] American Chemical Society.
rium value, as also required for a two-state reaction. Together, this provided very good evidence that the folding of this small protein follows indeed the simple two-state mechanism. The C o l d - S h o c k Protein CspB
Another small protein, the cold-shock protein from Bacillus subtilis named CspB (Jones et al., 1987; Willimsky et al., 1992) folds even faster than CI2. CspB, a protein of 67 residues, folds into a five-stranded (3-barrel; a helices are absent (Schnuchel et al., 1993; Schindelin et al., 1993). It does not contain cis prolyl bonds,
Protein Folding
167
disulfide bonds, or tightly bound cofactors. At 25 °C, CspB refolds from the urea-denatured state with a time constant of slightly less than a millisecond, and even in the middle of the urea-induced transition (where folding is slowest), the native and the unfolded molecules equilibrate very rapidly with a time constant of 30 ms (Schindler et al., 1995). This indicates that CspB can find its folded conformation in the time range of milliseconds even when the thermodynamic driving force (the Gibbs free energy of stabilization) is very small. In a simple two-state folding reaction, the measured rate constants of unfolding and refolding depend only on the final conditions. The corresponding amplitudes depend on both the initial and the final conditions and are equal to the difference in signal (e.g., in fluorescence or circular dichroism) of the protein between the two conditions (Tanford, 1968b). Thus the fraction of native molecules in a sample can be detected from the amplitude of their unfolding reaction, and the fraction of unfolded molecules from the amplitude of their refolding reaction. This provides a strict kinetic test of the two-state character of a protein-folding transition. The results of such a test for CspB are shown in Figure 4. The decrease in the fraction of N molecules (as determined from the amplitudes in the unfolding assays in Figure 4) follows closely the decrease in N as calculated from the two-state analysis of the equilibrium transition. Similarly, the increase in U molecules (as determined from the amplitudes in the refolding assays) closely followed the increase in U, again as
0
2
4
6
8
[urea] (M) Figure 4. Kinetic unfolding and refolding assays to measure the fractions of native (N) and of unfolded (U) molecules, respectively, in the folding transition of the cold-shock protein CspB. The fractions of CspB molecules in the native (a) and in the unfolded (•) state at pH 7.0, 25 °C, as measured by unfolding and refolding assays (described in the text) are shown. The solid curves represent the fraction of N and U as expected from the two-state analysis of the equilibrium unfolding transition measured under the same solvent conditions. The figure is reproduced with permission from Schindler and colleagues (1995).
168
FRANZ X.SCHMID
—,— 7.5
1.35
—,— 6.5
—i—
7.0 ppm 1
9-i
.
0.8 0.6
8-
^
*
0.4 0.2
0
7-
¥^§
-
()
1
2
,.
5
3 4 Urea, M
6-
5-
TA
|P 1
. 4-
3-
- J '
0.5
J
1.5 Urea, M
.
.
.
.
.
.
2.5
.
T
, ,
.
' ' l
3.5
169
Protein Folding
expected from the two-state analysis of the equilibrium data. Throughout the transition, the sum of the fractions of the N and the U molecules was close to 1.0. The presence of equilibrium intermediates in the urea-induced transition of CspB could thus be definitely excluded. O t h e r Small and Fast-Folding Proteins
The acyl-coenzyme A binding protein, a small four-helix bundle protein with 86 residues folds in a similarly rapid and simple fashion (Kragelund et al., 1995). At 0 M GdmCl, the time constant of folding is about 4 ms and, as in the case of CI2 (cf. Figure 3), the measured rate constant of folding X follows a chevronlike profile with linear limbs at high and at low denaturant concentration. The minimum of the chevron is at the midpoint of the equilibrium transition (2.3 M GdmCl) and here the native and the unfolded protein equilibrate with a time constant of 14 s. Similarly rapid and simple folding reactions were also found for homologous acyl-coenzyme A binding proteins from other species (Kragelund et al., 1996). Folding reactions that are faster than a millisecond cannot be followed by stopped-flow spectroscopy because the dead time of rapid mixing of conventional instruments is near one millisecond. Huang and Oas (1995) therefore used a dynamic NMR method to follow the very fast folding of the 6-85 aminoterminal fragment of the X repressor. This fragment X (6-85) is largely helical, and it also folds in a two-state fashion. In the transition region, the rate constants of both unfolding and refolding could be determined from simulations of the exchangebroadened resonances in the NMR spectrum. In their experiments, Huang and Oas (1995) used the 8H and eH resonances of Tyr-22 for the kinetic analysis because these resonances are well resolved from other resonances at most urea concentrations and because they displayed significant line broadening in the transition region. The shift and the broadening of the Tyr-22 resonances as a function of increasing
Figure 5. Extremely rapid folding of the X(6-85) fragment as followed by NMR line shape analysis. (A) The aromatic H NMR spectra at different urea concentrations used for this analysis are shown. The NMR samples consisted of 500 uA4 X(6-85) in 9 9 % 2 H - H 2 0 / 1 0 m M C 2 H 3 C 0 0 2 H / 1 0 0 m M NaCI/1 m M NaN 3 , pH 5.6 at 37 °C. The Tyr22 resonances used for the kinetic analysis are indicated with asterisks. (B) Natural logarithm of the X(6-685) refolding (circles) and unfolding (squares) rate constants as a function of the urea concentration. The open symbols were determined from the Tyr-22 eH resonance; the solid symbols are from Tyr-22 5H. The lines indicate weighted best fits of the data. A folding rate constant of 3600 ± 400 s"1 and an unfolding rate constant of 27 ± 6 s~ in the absence of urea are obtained from the 0 M intercepts of the linear fits. The insert shows a comparison between the fraction denatured (Fd) as calculated from the equation Fd = kj(k^ + ku) and the equilibrium unfolding transition as determined by the changes in CD (shown by the dotted curve). This figure is reproduced with permission from Huang and Oas (1995).
170
FRANZ X. SCHMID
urea concentration is clearly seen in Figure 5A. A joint simulation of these curves resulted in the rate constants of unfolding and refolding between 1.35 M and 3.14 M urea at 37 °C. The logarithms of both rate constants depend linearly on the denaturant concentration (Figure 5B), and at the midpoint of the transition, they show equal values of about 400 s"1. The refolding rate extrapolated to 0 M urea (and 37 °C) is 3600 s_1. This is an extremely high folding rate and might even approach the rate of chain collapse. It has to be taken with a grain of salt, however, because this value is extrapolated and because the studied molecule is only a fragment of an intact protein, which might not have reached the final folded state as it would have in the intact X repressor. By using T2 relaxation analysis, it could be shown, however, that the folding rate between 0 and 0.4 M urea is indeed between 3000 and 4000 s"1 (Burton et al., 1996). The results with CI2, CspB, A, (6-85), and the acyl-CoA-binding protein together with work on other small proteins (Khorasanizadeh et al., 1993; Viguera et al., 1994; Sosnick et al., 1996) show clearly that, in protein folding, the sequence information can be translated into the final, native threedimensional structure within less than a millisecond and is thus much faster than previously thought. These small proteins follow a two-state mechanism in their folding and, at the same time, they can fold extremely fast, even at the midpoints of their unfolding transitions where the equilibration between the native and the unfolded form is usually slowest. The property of fast two-state folding is clearly not related to the type of secondary structure: CspB is a P sheet protein, the X (6-85) fragment is helical, and the other small proteins contain both helices and sheets. It is, however, related to protein size; all these fast folding proteins or fragments show less than 100 residues.
FOLDING INTERMEDIATES Levinthal's argument (1968) that the folding of a protein by a random search through all possible conformations would require an astronomical amount of time seemed to be in striking contrast to the experimental observations that many proteins complete their folding in much less than a second. This led most researchers to assume that folding is fast because partially folded intermediates can form very rapidly and that these intermediates direct the folding process to a few or even a single productive pathway. Thus, exhaustive searching of the conformational space during the folding of a protein chain could be avoided. The proposition of folding pathways via intermediates proved to be fruitful, because it led to kinetic models that were amenable to experimental tests. Such tests consist essentially of two steps. In the first step, intermediates are localized by an analysis of the folding kinetics, and in the second step, the structure of these intermediates has to be elucidated. Kinetic evidence for intermediates in a folding reaction comes from multiexponential changes in a single signal and/or from differences in the kinetics when folding is monitored by different probes. Conclusive evidence for the presence of an obligatory on-pathway intermediate I in a U
Protein Folding
171
—> I —> N reaction is provided when the native protein N is formed with a lag at the beginning because N is produced from I, and I is not present at the beginning of folding. In practice, such lags have almost never been observed [but see (Schmid, 1983)] because often the formation of intermediates is several orders of magnitude faster than the subsequent rate-limiting step of folding. In this case, the lag in the formation of N would be extremely short and not detectable in the folding kinetics. The identification and characterization of intermediates is also difficult because intermediates do not necessarily accumulate to a high extent during folding. A productive intermediate could lower activation barriers so strongly that further folding to the native state becomes very fast. Evidently, such an intermediate would be extremely productive for folding, but yet it would be very hard to detect it. Similarly, intermediates which are in rapid equilibrium with the unfolded protein would not be detected easily by an analysis of the kinetics. Two major strategies are used to overcome these problems in the characterization of intermediates. In the first strategy, pulsed amide proton exchange together with high-resolution 2D-NMR techniques are used to label those backbone positions that become protected against exchange early in folding and remain protected during the subsequent folding steps (Udgaonkar and Baldwin, 1988; Roder et al., 1988). In the second strategy, conditions are searched under which folding intermediates exist at equilibrium and where their properties can be elucidated at leisure. In the latter approach, it is of course essential to show that these equilibrium intermediates are related with the transient kinetic intermediates (Ptitsyn, 1992, 1995; Kuwajima, 1989, 1996). Equilibrium Intermediates: The Molten Globule
Some proteins can indeed exist in partially folded conformations at equilibrium. These stable forms can often be produced by exposing a protein to moderate concentrations of a denaturant or to low pH or by adding salts. Partially folded intermediates of different proteins share some basic properties. They are usually compact with radii only 10-20% larger than the radius of the respective native protein, and they show most if not all of the helicity of the native protein as judged by the circular dichroism in the amide region. Specific nonlocal, tertiary packing interactions between the side chains seem to be absent in these forms. As a consequence, these intermediates show no circular dichroism in the aromatic region, and the *H-NMR signals are not as dispersed as in the native protein. Ohguchi and Wada were the first to propose the name "molten globule" for these partially folded forms (Ohguchi and Wada, 1984). It has since been adopted by most workers in the field. Right after its discovery in equilibrium transitions, the molten globule was suggested to be a central intermediate in the kinetics of folding as well. It is straightforward to assume that early in folding the protein chain collapses rapidly into a roughly globular form and that such a collapse would be most effective when
172
FRANZ X. SCHMID
accompanied by the simultaneous formation of extensive hydrogen-bonded secondary structure. Hydrogen-bonding reduces the polarity of the protein backbone and thus increases the driving force for the hydrophobic collapse. In turn, hydrogen bonds gain in stability when they are transferred into an environment of low polarity. This mutual enforcement of hydrophobic interactions and hydrogen bonding suggests that, even at this early stage, folding is already a cooperative process. Investigations of the molten globule state have concentrated on a few small proteins such as a-lactalbumin (Kuwajima, 1996), staphylococcal nuclease (Privalov, 1996), and myoglobin (Privalov, 1996; Barrick and Baldwin, 1993a,b; Hughson et al., 1990; Goldenberg, 1992). Intermediates of the molten-globule type are most readily observed for proteins with weak tertiary structure (such as staphylococcal nuclease) or for proteins in which the tertiary interactions have been weakened by removing tightly-bound cofactors (such as the haem from myoglobin, or the stabilizing Ca2+ ions from a-lactalbumin). Apo-a-lactalbumin
Kuwajima (1977) observed that, in the equilibrium unfolding of apo-a-lactalbumin, the change in the CD of the aromatic residues occurs at a lower denaturant concentration than the change in the amide CD. He concluded that this protein loses its tertiary structure before the secondary structure and that an intermediate is populated near 2 M GdmCl, which lacks most if not all of the tertiary interactions, whereas the nativelike secondary structure is still retained. Apo-a-lactalbumin shows roughly a bipartite structure and is divided into two subdomains. The a-helical subdomain consists of four helices, numbered from A to D, and the (3-sheet subdomain consists of an antiparallel (3-sheet and a small 3 ] 0 helix. The protein is stabilized by four disulfide bonds, two in the a subdomain and two in the (3 subdomain. Contrary to the expectations from the initial molten globule concept the structure of the equilibrium intermediate of apo-a-lactalbumin is nonuniform. The helical subdomain is highly structured but the P subdomain is not. By using NMR spectroscopy and amide proton exchange methods, Dobson and coworkers showed that at least two helices are formed in the intermediate (Dobson, 1991; Alexandrescu et al., 1993). Kim and coworkers showed that the native disulfide pairing is favored in the helical subdomain but not in the [3 subdomain (Peng et al., 1995; Schulman et al., 1995; Wu et al., 1995; Schulman and Kim, 1996). Together these results indicate that the a subdomain of apo-a-lactalbumin can adopt an overall nativelike but loosely packed structure. The stability of this structure is, however, very low and not sufficient to maintain the native set of all four disulfide bonds when they are allowed to isomerize in molecules in which one disulfide is reduced at a time (Ewbank and Creighton, 1991; Ewbank et al., 1995). The P subdomain seems to be unfolded in the molten globule. A bipartite structure, composed of a largely folded and a largely unfolded subdomain, is also found for the equilibrium folding intermediate of staphylococcal nuclease (Carra and Privalov, 1995; Privalov, 1996).
Protein Folding
173
Apo-Myoglobin
A partially-folded form of apo-myoglobin (apo-Mb) can be produced by lowering the pH. The loss of the amide CD in the course of acid-induced unfolding of apo-Mb occurs in two stages with a plateau near pH 4.5. This plateau originates from an equilibrium intermediate that appears to be partially unfolded by CD (Hughson et al., 1990, 1991; Barrick and Baldwin, 1993b; Myers et al., 1995; Kay and Baldwin, 1996). Investigations by NMR and by amide proton exchange showed that, in this intermediate, a part of the nativelike structure, namely helices A, G, and H, are still preserved, while the remainder of the molecule appears to be unfolded. The arrangement of these helices is reminiscent of the native form of myoglobin, an indication that some nativelike tertiary interactions must be present in the intermediate of apo-Mb. The folded structure of the intermediate could be extended to include the B helix by introducing helix-stabilizing mutations into this helix (Kiefhaber and Baldwin, 1995). Together, these data indicate that the partially-folded equilibrium intermediates of three proteins—apo-a-lactalbumin, staphylococcal nuclease, and apo-myoglobin—show a bipartite conformation in which one part of the protein has a rather well-defined tertiary structure with at least some nonlocal interactions, and the remaining part is largely unfolded. This is at variance with the molten globule's original concept which assumed that the intermediates show an overall loosened structure that is characterized by nativelike secondary structure and the absence of tertiary interactions. Kinetic Intermediates and Their Relation with Equilibrium Intermediates
Many unfolded proteins very rapidly adopt a new conformation (or a set of conformations) when they are diluted to refolding conditions. The protein molecules are more compact at this stage already, and they often show significant secondary structure as indicated by rapid changes in amide CD. These changes are typically complete within the dead time of stopped-flow mixing. Detailed structural information about early folding intermediates came from pulsed hydrogen/deuterium (H/D) exchange experiments and their analysis by 2D-NMR. In these experiments, the degree of protection against exchange with the aqueous solvent of amide protons is measured as a function of the refolding time. Only the fate of those amide protons that are highly protected (by hydrogen bonding in stable secondary structure) can be followed by this technique. After the pioneering amide proton exchange experiments with pancreatic RNase A (Udgaonkar and Baldwin, 1988) and with cytochrome c (Roder et al., 1988), this method was applied to several other proteins (Mullins et al., 1993; Matouschek et al., 1992; Baldwin, 1993) and in all cases it was observed that indeed a framework of secondary structure was formed very early in their refolding and that the protection against exchange of the initially labeled amide protons increased with increasing duration of refolding (Udgaonkar and Baldwin, 1995).
174
FRANZ X.SCHMID
For several proteins, pulsed hydrogen/deuterium (H/D) exchange was also used to compare the equilibrium and the kinetic folding intermediates. Although the equilibrium intermediates were investigated under nonphysiological conditions such as at low pH, at high salt concentrations, or in the presence of moderate concentrations of denaturant, they still seem to resemble quite closely the kinetic intermediates that accumulate transiently under native folding conditions. This was clearly demonstrated for apo-Mb. The helices (A, G, and H) that are ordered in the equilibrium molten globule at pH 4.5 (Hughson et al., 1990) are also the first ones to be formed early in the process of refolding at neutral pH (Jennings and Wright, 1993). A high similarity between the equilibrium and kinetic intermediates was also found for lysozyme and for apo-a-lactalbumin (Miranker et al., 1991; Buck et al., 1993;Balbachetal., 1997). This close correspondence between the kinetic and the equilibrium folding intermediates boosted the efforts to resolve the three-dimensional structures of equilibrium intermediates by NMR (Evans et al., 1991; Redfield et al., 1994; Haezebrouck et al., 1995). Initial data on the structure of the molten globule intermediate of apo-a-lactalbumin are available. It has elements of the native structure predominantly in the helical domain, and it shows some nativelike nonlocal interactions between hydrophobic residues (Alexandrescu et al., 1993). The rate-limiting conversion of the kinetic molten globule of apo-a-lactalbumin to the native state is a cooperative process, and identical time courses were observed when folding was monitored by following several resolved resonances in real-time NMR spectroscopy (Balbach et al., 1996). Characterization of Kinetic Folding Intermediates by Protein Engineering
Fersht and coworkers used a protein engineering strategy to characterize the structure of a folding intermediate in the folding of the small ribonuclease barnase (Fersht, 1993; Fersht and Serrano, 1993). They changed individual amino acid side-chains by site-directed mutagenesis and then analyzed the resulting changes in both the equilibrium unfolding and the folding kinetics. From the observed changes, conclusions could be drawn about the contribution of the mutated residues to the energetics of the folded wild-type protein, its folding intermediate, and the activated state, which separates them. By studying more than 100 variants of barnase by this protein-engineering method, Fersht's group was able to give a detailed picture of the folding intermediate that accumulates prior to the ratelimiting step of folding (Fersht et al., 1991, 1992; Serrano et al., 1992a,b,c; Matouschek et al., 1992; 1995a; Arcus et al., 1995). The central part of the antiparallel p sheet and the carboxyterminal part of the major a helix of barnase are already formed in this intermediate, while loops and other peripheral regions seem to be still unfolded. These results were confirmed by pulsed amide proton exchange experiments (Matouschek et al., 1992).
Protein Folding
175 The Role of Intermediates for Folding
The simple two-state approximation and the linear free-energy relationships, as discussed previously, predicted that the logarithm of the rate constant of unfolding should increase linearly when the denaturant concentration is increased and that that of refolding should increase linearly when the denaturant concentration is decreased (Equations 5 and 6 and Figure 1C). In fact, a linear increase of the refolding rate constant (as in Figure 3) is only rarely observed. In most cases, this rate constant levels off at low denaturant concentrations as shown in the inset of Figure 3 for the folding of barnase. This phenomenon is now generally called a "rollover" in the folding rate (from being denaturant-independent at low denaturant concentration to a linear decrease in the logarithm of the folding rate) (Baldwin, 1996). Such a rollover was observed in the early folding experiments with RNase A (Nail et al., 1978), cytochrome c (Ridge et al., 1981), and lysozyme (Kato et al., 1982), and it was usually assumed that it originates from a change in the folding mechanism. Fersht and coworkers suggested that the rollover in the folding kinetics of barnase may be caused by the formation of nonproductive intermediates (Fersht, 1993). The observed rate of folding is smaller than the expected (linearly extrapolated) value because the accumulation of the intermediate retards folding. Ubiquitin is one of the small proteins that refold fast and show a pronounced rollover in the refolding kinetics (Khorasanizadeh et al., 1993). Roder and coworkers pointed out that a rollover is not necessarily caused by the accumulation of nonproductive intermediates. They could explain the refolding kinetics of ubiquitin equally well by an alternative model in which an intermediate I is required on the productive pathway to maintain a high rate of folding. I is in rapid equilibrium with the unfolded form (U ^ I) early in refolding (Khorasanizadeh et al., 1996). At very low denaturant concentration (above the rollover) the intermediate is fully formed and thus the maximal rate of folding is reached. With increasing denaturant concentration, the U ^ I equilibrium is gradually shifted to U and, as a consequence, the observed rate of folding decreases because it depends on the fraction of the molecules being in the I state (kf = [I] x£ IN ). This model is plausible. Basically it assumes that the intermediate I is productive for folding and that folding becomes progressively slower because the intermediate is destabilized with increasing denaturant concentration. A basic difference betweens Roder's model (Khorasanizadeh et al., 1996) and the simple two-state analysis should, however, be noted. In the two-state analysis, the decrease in the rate of refolding (the kinetic m value of refolding) is assumed to originate from the difference in solvent accessibility of the unfolded and the activated state of refolding. In Roder's model (Khorasanizadeh et al., 1996), this decrease originates from the denaturant-driven "unfolding transition" of the intermediate I, and the m value of this transition determines the denaturant dependence of the measured rate constant of folding (kf = [I] x klN). The microscopic rate constant of refolding (klN) is assumed to be independent of the denaturant concentration. In this case, the intermediate and the activated state of
176
FRANZ X. SCHMID
refolding do not differ in their interaction with the solvent. At first glance, this seems surprising but it could point to a concept for folding in which the activated state would be the least stable intermediate of folding. The activated state of folding is further discussed below. The folding experiments on ubiquitin and the kinetic models proposed by Roder and coworkers (Khorasanizadeh et al., 1993, 1996) show that a rollover in the folding kinetics can be explained by off-pathway intermediates as well as by on-pathway intermediates. Clearly, further tests are needed to discriminate between the conflicting views about the role of intermediates for folding. There is convincing evidence for a few proteins that traps indeed exist in their folding that lead to the accumulation of refolding molecules in partially-folded stalled conformations. The slow refolding of RNase Tl is more than tenfold accelerated when an unfavorable contact is removed by the Trp59/Tyr mutation (Kiefhaber et al., 1990a). This mutation relieves a premature close contact between the bulky Trp59 and Pro39 that exists in the native protein and apparently also in a folding intermediate in which the rate-limiting step of folding, the trans —> cis isomerization of Pro39, has not yet occurred. Premature nativelike structure can also retard the folding of disulfide-containing proteins (Weissman and Kim, 1991; Mucke and Schmid, 1994). In the refolding of cytochrome c, the refolding molecules are trapped in a nonproductive conformation because, in the unfolded state, the heme is coordinated with a His residue rather than with the native ligand, which is Met80. In this respect, the nonnative heme-ligation resembles the retardations of folding caused by incorrect prolyl isomers. When this misligation is suppressed by unfolding at low pH, the kinetic trap is avoided and cytochrome c refolds with a time constant of about 15 ms (Sosnick et al., 1994, 1996).
THE RATE-LIMITING EVENTS IN PROTEIN FOLDING To understand the kinetics of folding it is necessary to characterize not only the intermediates but also the activated states of folding. The doubts about the kinetic importance of the populated intermediates has revived the interest in the activated states of folding. The height of the activation barrier controls the rate of a folding reaction and thus also the average time that a folding molecule spends in aggregation-prone nonnative conformations. Protein folding is a very complex process, and a priori it is not clear whether folding reactions follow the same rules as simple chemical reactions and proceed via well-defined transition states. In transition state theory, it is assumed that the activated state exists in a pseudo-equilibrium with the ground state, and therefore thermodynamic quantities such as the Gibbs free energy of activation AG*, the enthalpy of activation AH*, and the entropy of activation AS* can be determined from the rate constant k of a reaction and from its dependence on temperature in the same mannner as the thermodynamic parameters of a reaction can be determined
Protein Folding
Ml
from its equilibrium constant K. The transition state theory was developed to describe the formation and breakage of chemical bonds in the reactions of small molecules. Proteins are extremely large molecules. Their folding involves the cooperative formation of many weak noncovalent interactions, and the native conformation is searched along a very shallow energy gradient. It is thus conceivable that not a single, but many different paths exist between the unfolded and the native conformations. The extreme case would be that every protein molecule follows its own individual path in folding as assumed, for example, in the jigsaw puzzle model (Harrison and Durbin, 1985). It is clear that if there is no unique pathway, there can also be no unique thermodynamic transition state for folding. It is a central assumption in transition state theory that, in a reaction, the ground state and the activated state exist in a pseudo-thermodynamic equilibrium, although the activated state is extremely short lived. This is reasonable for a small molecule with very few degrees of freedom, but it might not hold for a protein. The conformational space that is accessible for a folding protein chain in the activated state is still very large, and it is possible that it cannot be sampled during the short lifetime of the activated state. Clearly, the question of whether folding is an activation-controlled reaction cannot be answered on theoretical grounds because the nature of folding reactions is not well-enough understood. Fortunately, there are several experimental criteria for activation-controlled processes that are met by protein folding reactions. 1. There is an energy barrier in protein folding that separates the native state from the unfolded conformations. The presence of this barrier is responsible for the two-state character of many equilibrium unfolding transitions, which resemble first-order phase transitions (Privalov, 1979). 2. The folding kinetics of small proteins obey the simple rules of chemical kinetics, that is, the changes in the physical properties of a protein during folding follow monoexponential kinetics. This confirms that there is indeed a common barrier between the unfolded and the native state and that virtually all molecules encounter this barrier in their refolding. 3. Identical rate constants are observed when unfolding and refolding experiments are performed under the same final conditions but starting from different initial conditions. This shows that there is a rapid preequilibration of the protein molecules to the new solvent conditions at the onset of folding that does not affect the rate-limiting event of folding. In other words, the folding molecules do not retain a "memory" of their previous unfolding history (which in principle could be different for the individual protein molecules). Of course, this does not hold for unfolded molecules, which differ in the cisltrans isomerism of prolyl bonds, in the topology of disulfide-bonded loops, or in cofactor interactions (such as incorrect heme ligation in cytochrome c) (Sosnick et al., 1994).
FRANZ X. SCHMID
178
Together, this suggests that protein folding is indeed an activation-controlled process, and transition state theory is therefore widely used to analyze the properties of the activated state. The activated state of a reaction is never populated at equilibrium and therefore its properties can be inferred only indirectly from the reaction rate constants and how they change in response to changes made in the system. Two complementary experimental approaches are often used. In the first approach, the conditions for the folding experiment, such as the solvent composition and the temperature, are varied to determine the thermodynamic properties of the activated state from the resulting changes in the microscopic rate constants of unfolding and refolding. In the second approach, the folding protein itself is changed by site-directed mutagenesis to probe the contributions of the mutated residues to the energetics of the activated state. It is a prerequisite in both approaches that the changes that were made in the conditions or in the folding protein do not change the activated state itself, but only its stability relative to the unfolded and the native state. The characterization of the activated state requires a thorough analysis of both equilibrium stability and folding kinetics. The measured rate constant A, has to be decomposed into the contributions of the microscopic rate constants of refolding and unfolding, kVN and /cNU. Such a decomposition is straightforward for two-state folding reactions (cf. Equations 1-6) but can become very difficult when prolyl isomerizations couple with conformational folding as in Equation 8 (Kiefhaber et al., 1992). The folding kinetics and the equilibrium stability are correlated because the equilibrium constant KD is equal to the ratio of the rate constants. Therefore KD can be determined from the calculated values for &UN and kNV and compared with the KD values measured independently in equilibrium unfolding experiments. This crossvalidation of kinetic and equilibrium data is very important to evaluate the reliability of the kinetic model and of the microscopic rate constants of folding that were derived from it. Analysis of the Properties of Activated States of Folding The Changes in the m Value and in the Heat Capacity
The consequences of amino acid substitutions or of changes in the folding conditions on the energy of the activated state are usually analyzed on the basis of transition state theory employing a simple reaction diagram as in Figure 6. The activation Gibbs free energy of unfolding AGJ^ is equal to the difference in free energy between the native and the activated state, and the activation Gibbs free energy of refolding AG^N reflects the difference in free energy between the unfolded and the activated state. AGj^ and AG^N are calculated from kNV and kVN according to Equations 9 and 10. AG*NV = -R1\n[hkNV/(kBT)]
(9)
179
Protein Folding
1
;i
'7rm
AG'NU J
AGUN
1
'
1
V- '~*S
1
u
N
•*
Refolding
Unfolding
REACTION COORDINATE Figure 6. Reaction coordinate diagram for a simple two-state folding reaction. The axes are not drawn to scale. The activation energies are usually much higher than the equilibrium free energy of unfolding. This figure is reproduced with permission from Matthews (1987).
AGSN = -/?71n[/i* UN /(t B r)]
(10)
In Equations 9 and 10, R is the gas constant, T is the absolute temperature, and h and &B are the Planck and Boltzmann constants, respectively. The relation between the kinetic and the equilibrium data is also apparent from the reaction diagram in Figure 6: AGUN is equal to the difference between the activation free energies AG£u andAG* N . Usually the temperature or the denaturant concentration is varied to shift the equilibrium between the native and the unfolded form of a protein (i.e., to unfold and refold it). The equilibrium constant of protein unfolding depends on temperature because the enthalpy and the heat capacity increase during heat-induced unfolding (Privalov, 1979). Similarly, the rate constants of unfolding and refolding depend on temperature, and the corresponding activation enthalpies AH^V and A//y N and activation heat capacities AC* NU and AC* UN of unfolding and refolding can be determined from the dependence on temperature of the microscopic rate constants. The activation entropies AS^V and ASJ}N can be calculated from the respective AG* and AH* values.
180
FRANZ X. SCHMID
The activation heat capacities (AC* NU and AC* UN ) and the activation m values (m^y and mJ}N) are of particular interest because they give information about the structure of the activated state. The heat capacity of a protein decreases upon refolding as nonpolar surface is withdrawn from the solvent. The fractional changes in heat capacity upon unfolding and refolding thus report on the extent of exposure of hydrophobic surface in the activated state (relative to the unfolded and the native state). This is best illustrated by considering two limiting cases. If the activated state of folding is as open and solvent-accessible as the unfolded state, then AC* UN should be zero, and the entire change in heat capacity should thus occur in unfolding. Conversely, if the activated state were already as compact as the native state, then the entire change in heat capacity should occur in refolding and AC* NU should be zero. The activation heat capacities are difficult to measure because they are determined from the curvature of Arrhenius diagrams. First experiments with lysozyme indicated that in fact for this protein the activated state appeared to be virtually nativelike because the entire change in heat capacity seemed to occur in the refolding reaction (Segawa and Sugihara, 1984). A similar result was obtained for CspB (Schindler et al., 1995; Schindler and Schmid, 1996). For CI2, approximately half of the change in heat capacity was found to occur in unfolding and half was found in refolding indicating that this protein had lost about half of the solvent accessibility when in the activated state of folding (Jackson and Fersht, 1991b). A related information is obtained from the kinetic analysis of the denaturantinduced unfolding and refolding. The kinetic m values (m^v and WyN) report on the change in the interaction of a folding protein with the denaturant molecules in the solvent (Tanford, 1968b; Schellman, 1978; Pace, 1986). Again, the limiting cases illustrate the information gained from the denaturant-dependence of the folding kinetics. The entire dependence on denaturant should reside in the rate constant of refolding when the activated state is already nativelike in its interaction with the solvent, and the rate of unfolding should be independent of the denaturant concentration. Conversely, the rate of refolding should become independent of the denaturant concentration (i.e., mJN = 0) when the activated state occurs very early and binds the denaturant as well as the unfolded protein. In this case, m^v approximates the equilibrium m value. These considerations indicate that the kinetic chevron plots (In X = f [D], as in Figure 3) carry an easily recognized information about the activated state of folding. A "regular" chevron with slopes of equal steepness at high and low denaturant concentration points to an activated state that is "half-native" in its interaction with the denaturing solvent. An asymmetric chevron with a flat unfolding limb points to a nativelike activated state, a chevron with a flat refolding limb to an unfoldedlike activated state. For CspB, chevrons with an almost flat unfolding limb pointed to a highly nativelike activated state of folding (Figure 7) (Schindler et al., 1995; Schindler and Schmid, 1996).
181
Protein Folding
T
N £
r
•
T
T — r -»— •s
-
^*V~OQ-^
km 11 A
1 35 °C j 1
2
1
_l
1
4 [urea] (M)
6
2
I
I
i
4 [urea] (M)
L_ 6
8
0
2
4
6
[urea] (M)
Figure 7. The rapid folding kinetics of CspB. Dependence of the apparent rate constant X of unfolding (•) and refolding (o) on the urea concentration at different temperatures. A total of 327 data points (at 14 different temperatures) were subjected to a joint fit to combined equations 2, 5, and 6. The profiles for X as calculated from the fit parameters in Table 1 are shown by the solid lines for each temperature. The kinetics were monitored by the change in fluorescence above 300 nm in 0.1 M Na cacodylate/HCI, pH 7.0. The figure is reprinted with permission from Schindler and Schmid (1996). Copyright [1996] American Chemical Society.
The informational contents of the temperature- and denaturant-dependent folding kinetics are related. Both report on the interactions of the activated state with the solvent. The AC* criterion is sensitive to the interactions of the activated state with water (relative to the native and unfolded forms), and the m* criterion is sensitive to the differential interaction of the activated state with the denaturant molecules. Indeed, similar results are obtained by these two criteria for several proteins. The transition state of folding of T4-lysozyme is about 75% native by both rrf and AC* (Chen et al., 1989, 1992), and that of CI2 is 60% native by m* and 50% native by ACp (Itzhaki et al., 1995; Oliveberg et al., 1995). It is possible that the ra* criterion gives a slightly more native character because the denaturant molecules are larger than the water molecules and are thus more readily excluded from the folding protein chain.
182
FRANZ X. SCHMID
AH* and AC* are obtained from the dependence on temperature of the folding kinetics, whereas the m* values come from the dependence on the concentration of denaturant. A measurement of the folding kinetics as a function of both variables can thus be used for a joint analysis to derive all thermodynamic activation parameters and their dependences on denaturant concentration and temperature. Such an analysis was performed for CspB. It was based on kinetic chevron plots measured at 14 temperatures between 2 °C and 45 °C, as shown in Figure 7. The unfolding limbs of these chevrons are almost flat at all temperatures, indicating that the activated state in the folding of this protein is 96% nativelike at all temperatures. By the AC* criterion, it is 90% native (Figure 8A) (Schindler and Schmid, 1996). The nativelike character of the activated state of CspB has interesting consequences for the refolding reaction of this protein. The equilibrium stability of proteins is generally small and results from a pronounced temperature-dependent compensation of enthalpy and entropy (Privalov, 1979). This temperature-dependent change in the enthalpy/entropy compensation is found in the activation parameters of refolding because the activated state of folding of CspB is so close to the native state. This is shown in Figure 8B,C. At low temperature, both AH* and AS* are positive and very large; this suggests that, at low temperature, the refolding molecules encounter an enthalpic barrier. Both AH* and AS* of refolding decrease strongly with temperature, and at 37 °C they are already slightly negative (Figure 8B,C). Apparently the barrier to refolding changes its properties from being largely enthalpic to being largely entropic when the temperature is increased, as expected for a folding reaction in which the reduction of the huge number of conformations of the unfolded state to a few productive activated states is the major barrier on the route to the native state (Bryngelson et al., 1995). This interpretation, however, is overly simple because it accounts only for changes within the folding protein chain itself, and the changes in the solvent are neglected. These changes in the solvent and not the intraprotein interactions are responsible for the decrease in heat capacity during folding and thus for the observed strong decrease with temperature of the activation enthalpy and the entropy of folding (Makhatadze and Privalov, 1994, 1995). This shows clearly that not the chain folding itself but the changes in the solvent around the nativelike activated state lead to the observed transition from an enthalpic to an entropic barrier. It is thus a consequence of the hydrophobic effect. AH * of unfolding of CspB shows a large positive value of about 100 kJ/mol under all conditions (Figure 8B). This indicates that, although the transition state of folding has already nativelike interactions with the solvent, it still differs significantly from the fully folded state in its enthalpy. Probably, a fraction of the native stabilizing interactions are not yet formed in the transition state. As outlined, there are doubts whether protein folding reactions can be described by transition state theory. In particular, it remains unclear as to what value should be used for the preexponential factor in the Eyring equation (Equations 9 and 10). The values for AG" and AS* (but not for AH* and AC" ) depend on the magnitude
183
Protein Folding
100
TS
N
1
I
i
_
B1
y10°C
i
50
H
•
o E
u
0 V \25 °C|
-50 . 100
\
37 ° C \ 1
i
U
TS
I
1
N
Reaction Coordinate Figure 8. Reaction profiles for the folding of CspB at pH 7.0, 0 M urea when going from the unfolded state (U) via the transition state (TS) to the native state (N). The heat capacity (panel A) is assumed not to depend on temperature in the range studied. The enthalpy H (panel B) and the corresponding entropic term -TS (panel C) are shown for 10 °C (a), 25 °C (o), and 37 °C (A). The traces in B and C are arbitrarily aligned such that the values for U coincide. The figure is reprinted with permission from Schindler and Schmid (1996). Copyright [1996] American Chemical Society.
184
FRANZ X. SCHMID
of this preexponential factor (Equation 10), and therefore the absolute values of AS* are not known. It is possible that the absence of intermediates and the nativelike activated state in the folding of CspB are correlated with the small size and the structural type of this protein. The stabilization of a small P-sheet as in CspB requires extensive nonlocal interactions and therefore incomplete sheets tend to be unstable. As a consequence, the critical activated state is reached only very late in folding. Mutational Analysis of Activated States
In these analyses, the protein itself is changed by mutation to map the energetics of the transition state. They complement and extend the results obtained from changing the folding conditions, and the combination of both approaches can lead to a detailed picture of the activated state of folding. The protein engineering approach has been pioneered by Matthews and coworkers in their work on the complex folding reactions of the (3 subunit of tryptophan synthase and of dihydrofolate reductase (Matthews, 1987, 1993; Garvey and Matthews, 1989; Tweedy et al., 1990; Jennings et al., 1991; Tsuji et al., 1993; Saab-Rincon et al., 1996). Usually individual residues are mutated. These mutations can change the energy levels of the native, the unfolded, and the activated state of folding relative to one another (cf. Figure 9). If in addition an intermediate accumulates during folding, its energy can change as well. Mutations can thus act as local reporters for the contributions of individual regions of the protein to the stabilization free energy of the different states of folding. It is of course a necessary prerequisite that, when the structure of the activated state of folding is probed by mutations, these mutations change only the relative stability of the activated state, but not its structure. It is further assumed that the mutations do not affect the unfolded state, that is, the energy diagrams of the wild-type protein and the mutated protein are normalized relative to the energy of the unfolded protein (Figure 9). A detailed description of the protein engineering method is given by Fersht (1993). His group has used this method to characterize the activated states of folding of barnase and CI2 (Matouschek et al., 1990, 1992; Serrano et al., 1992a,c, 1993; Fersht et al., 1991; Itzhaki et al., 1995). As in the other strategies, it is mandatory to determine the influence of the mutations on both the equilibrium stability and the unfolding/refolding kinetics. The equilibrium measurements give the change in the Gibbs free energy between the native and the unfolded state (AAG ). The kinetic measurements give the changes in the activation free energies of unfolding (AAG^y) and of refolding (AAG^N). The comparison of the kinetic and equilibrium free energy changes can then be used to probe the energetics of the transition state. This is immediately apparent for the two limiting cases. First, consider a case where a destabilizing mutation decreases only the rate of refolding but leaves the rate of unfolding unchanged. In this case, the entire change in stability caused by the mutation is found in a corresponding change in the activation energy of refolding, that is, AAG* N = AAGeq or AAG^/AAG^ = 1. This ratio is called 0> by Fersht and
Protein Folding
185
*(app)
Mutant
=
AAG, ( a p p ,
^ ' ~ AAG F ( a p p )
_ AAGt(app, f*
Wild type
MGF(appj
Figure 9. Free energy diagrams for the folding of a wild-type and a mutant protein. EJJ, E,, E+ and EF are the energies of the unfolded, intermediate, transition, and folded states, respectively. The energy levels are normalized such that the energies of the unfolded states are the same. The definitions of the O values for the intermediate (O,) and for the transition states (O^.), as used in Fersht's analysis of the folding mechanism, are also given. This figure is reproduced with permission from Fersht (1993).
coworkers (Fersht, 1993). A <X> value of 1 as in this case thus suggests that in the activated state the position probed by mutagenesis was as native as in the native state. Therefore both states are affected to the same extent by the mutation, and the difference in free energy between the two states (and thus the rate of unfolding) is not affected. Conversely, when a mutation maps to a position that is already unfolded in the activated state, only the rate of unfolding would be affected and the O value would be 0. Fractional values for O are difficult to interpret (Fersht, 1993). Generally, the perturbations by mutations should be as small as possible to minimize the risk of changing the folding mechanism itself by the mutations, and double-mutant cycles can be used to reveal possible energetic interactions between sites in a protein. The protein engineering method was used to investigate the folding of barnase and more than 100 variants have been employed in these studies (Fersht, 1993). As a result, structural data could be derived for a folding intermediate (described in the previous section) and for the activated state of folding, primarily from the analysis of the unfolding kinetics of the protein variants. In essence, the activated state of unfolding of barnase seems to resemble the folding intermediate. The center of the major antiparallel (3 sheet and the C-terminus of the helix are still formed, but several turns are already disordered (Fersht, 1993, 1995a; Matouschek et al., 1995a).
FRANZ X. SCHMID
186
In the case of CI2, mostly fractional O values were observed; this complicated the analysis based on the simple rules that were outlined above (Fersht, 1995b). These fractional values were nevertheless interpreted to indicate that most of the interactions in CI2 are in the process of being formed in the transition state of folding. Itzhaki et al. (1995) proposed that a folding nucleus develops in the transition state. "The onrush of stability as the nucleus consolidates its local and long range interactions is so rapid that it is not yet fully formed in the transition state" (Itzhaki et al., 1995). The authors thus conclude that the transition state in the folding of CI2 appears to be an expanded form of the native state (Otzen et al., 1994). In this state it is assumed that there is a more strongly structured nucleus that consists only of the second half of the a helix and a distant residue, which makes a contact with the helix. In later work, the transition state of CI2 was found to be compact, but rather uniformly unstructured (Tan et al., 1996). The application of the protein engineering methods to protein folding has significantly expanded our knowledge about the elusive activated states of folding. The results on barnase indicate that the activated state resembles the intermediate, which accumulates prior to the rate-limiting event of folding. The "intermediate" might in fact consist of a mixture of molecules that resemble each other in their gross structure, but only a small fraction of these molecules can accrete some crucial additional structure to reach the activated state. This would be close to the view that the activated state is the least stable obligatory intermediate on the folding pathway. Intermediates and Activated States
Several proteins fold unusually fast. They are unrelated in three-dimensional structure, but all of them are small. Apart from ubiquitin (Khorasanizadeh et al., 1996), they follow N ^ U two-state mechanisms both in their equilibrium transitions and in their folding kinetics. Thus it seems that rapid folding and the absence of populated intermediates are correlated with the small size of these proteins. These small hydrophilic proteins cannot form intermediates that contain only a part of the nativelike interactions and, if they form transiently, they rapidly revert back to the unfolded state. This instability of the intermediates avoids trapping of the folding protein in incorrect, partially-folded states and could thus contribute to the rapid folding. In several models for protein folding, it is assumed that productive intermediates should be unstable to maintain the high cooperativity of the folding process (Go, 1983; Creighton et al., 1996; Privalov, 1996; Jonsson et al., 1996). The activated state would be reached when the cooperativity of the emerging folded structure has increased to the point where the probability of making additional interactions becomes higher than the probability of losing already existing interactions. It is clear that small proteins with very few cooperative interactions should not show intermediates and reach their transition state very late in folding (Sosnick et al., 1996).
Protein Folding
187
However, it is doubtful, whether such a generalization is valid. A nativelike activated state, as in the folding of CspB, is not observed for other small, fast-folding proteins. The acyl-CoA binding protein (Kragelund et al., 1995, 1996), the fragment of X repressor (Huang and Oas, 1995), cytochrome c (Sosnick et al., 1994, 1996), and the chymotrypsin inhibitor CI2 (Jackson and Fersht, 1991a,b) all show activated states which, by the m criterion, are about 50% nativelike. The first two proteins are predominantly a-helical and CI2 is a mixed a/p protein. In contrast, CspB contains only P structure. Interestingly, the activated state of folding of another small, fast-folding P-protein, the SH3 domain of spectrin (Viguera et al., 1994), is about 80% native by the m criterion. The inherent difficulty to form a P-sheet structure is apparently not a disadvantage in the folding of these proteins. Rather, alternative nonnative structures are even less stable. Therefore they are avoided and productive folding is very fast and efficient. It is thus possible that the nature of the transition state correlates with the secondary structure of a protein. Unlike P sheets, a helices are local structures and therefore individual helices can form very easily and, in the transition state, serve as kernels for the rapid assembly of the remaining structure. Such a facile formation of local structure can of course also lead to nonnative structures and thus decelerate overall folding. Although being very useful, the concept of a unique transition state in folding may be misleading. An enormous number of weak, noncovalent interactions change in the process of folding and there might be many different ways to reach and sample the energy landscape of the activated state. The lifetime of a molecule in the activated state is probably not long enough to explore its entire conformational space and thus the assumption of a pseudoequilibrium between ground and activated state is probably not valid for folding reactions. As a consequence, the absolute values for activation free energies and activation entropies of folding as derived from transition state theory remain ambiguous. New Kinetic Methods to Follow Folding Events in the Submilliseconds Range
Stopped-flow mixing techniques are commonly used to follow the kinetics of protein folding. The dead time of mixing in a conventional stopped-flow apparatus is about one millisecond. Faster events in folding, such as the formation of molten-globule intermediates or even the complete folding of several small proteins can thus not be followed by the stopped-flow technique. Several strategies are used to overcome the time limitation of one millisecond (Eaton et al., 1997). In the first approach, different ultrafast continuous-flow techniques are employed to reduce the dead time to about 0.1 ms. Such methods were used to measure submillisecond events in the folding of cytochrome c (Eaton et al., 1996; Takahshi et al., 1997; Chan et al., 1997; Shastry et al., 1998). Gray and coworkers developed an optical triggering method to initiate the folding of reduced cytochrome c. Cytochrome c is more stable in the reduced than in the
188
FRANZ X. SCHMID
oxidized form, and refolding can therefore be initiated by transferring an electron to unfolded oxidized cytochrome c. The electron for reduction was liberated from the reductant by irradiation with a short laser light pulse (Pascher et al., 1996). The folding of cytochrome c could also be initiated by photodissociation of carbon monoxide (Chan et al., 1996). The temperature of a solution can be increased by more than 10 degrees within a few nanoseconds by irradiation with an infrared laser. The heat-induced refolding of cold-denatured proteins can be measured after such laser-temperature jumps. Gruebele and coworkers have used this method to identify folding events in the low microseconds range in the folding of apo-myoglobin (Ballew et al., 1996a,b). Nolting et al. (1995) followed the refolding of cold-denatured barstar after a conventional temperature jump and found a 300 JLXS phase in its folding. Oas and coworkers used dynamic NMR methods to measure the folding and unfolding rate constants of the helical 6-85 fragment of the X repressor (Huang and Oas, 1995; Burton et al., 1996). This work is described above.
FOLDING OF DISULFIDE-CONTAINING PROTEINS For proteins that contain disulfide bonds in the native state, these disulfide bonds must form in an oxidative reaction during the folding process. Disulfide-coupled protein folding was reviewed by Creighton (1978, 1985, 1986, 1990, 1992b) (Creighton et al., 1996). Disulfide bond formation requires covalent thiol/disulfide exchange with a redox system such as a mixture of oxidized and reduced glutathione. In vivo, thiol/disulfide oxidoreductases such as eukaryotic protein disulfide isomerase or the DsbA and DsbC proteins of E. coli serve as oxidants in disulfide formation (Freedman, 1992; Freedman et al., 1994; Bardwell and Beckwith, 1993; Darby and Creighton, 1995; Wunderlich et al., 1995). The reactive species in thiol/disulfide exchange are the thiolate anions of the cysteine residues, and therefore the rates of oxidative folding are strongly dependent on pH, and intermediates that differ in disulfide bonding can efficiently be stabilized by acidification or by a covalent modification of all reactive thiol groups (e.g., with iodoacetic acid). In contrast to the elusive conformational folding intermediates, the intermediates of oxidative folding can thus be effectively trapped in a stable form and analyzed. The pathway of disulfide bond formation and its interrelation with conformational folding has been characterized primarily by Creighton and coworkers in great detail for bovine pancreatic trypsin inhibitor (BPTI). It will therefore be used as an example to outline the basic properties of oxidative folding reactions (Creighton et al., 1996). BPTI is a small protein with 58 amino acid residues, and in the native state it contains three disulfide bonds that link Cys5 with Cys55, Cys30 with Cys51, and Cysl4 with Cys38. The oxidized protein shows a high conformational stability, but it unfolds upon reduction of the three disulfide bonds. For understanding the
Protein Folding
189
mechanism of its oxidative folding, it is important to remember that the disulfides 30-51 and 5-55 are buried in the native protein, whereas 14-38 is accessible. It will become evident that the 14-38 bond has to form last in folding, because otherwise unpaired cysteines would become buried too early and thus inaccessible for oxidation. At the beginning of the oxidative folding of BPTI when folded structure is still absent, there is no significant preference for a particular pairing, and the first disulfide bond is formed almost at random. The rates of formation of these initial disulfide bonds seem to be largely determined by the chemistry of thiol/disulfide exchange and the effects of nearest neighbors along the chain. Adjacent, positivelycharged groups increase the reactivity because they stabilize the thiolate anion. Cysl4 and Cys38 are adjacent to positively charged groups (Lysl5 and Arg39) and are therefore very reactive at the stage of forming the first disulfide bond. The one-disulfide intermediates interconvert rapidly. This leads to an increased fraction of molecules that show the 30-51 disulfide bond because this native pairing induces the formation of partially native structure, which in a reciprocal fashion stabilizes this disulfide bond (Darby and Creighton, 1993). This partial folding also orders the structure around Cys55 and decreases its reactivity in the subsequent steps. The second disulfides are thus primarily formed by reactions between Cysl4, Cys38, and Cys5. As a consequence several intermediates with two disulfide bonds accumulate. Two of them have a wrong disulfide (5-14 or 5-38) in addition to 30-51. They are denoted as [30-51, 5-14] and [30-51, 5-38]. These intermediates contain a nonnative disulfide, not because it is stabilized by (nonnative) conformational forces, but because the nativelike conformation around Cys55 hinders the formation of the correct 5-55 bond. The [30-51,14-38] intermediate has two correct disulfides. It cannot complete folding because the missing 5-55 bond cannot be formed. It is buried in the interior of native BPTI and is thus not accessible for thiol/disulfide exchange reactions. Thus the [30-51, 14-38] intermediate has to rearrange to either [30-51, 5-14] or [30-51, 5-38] to continue folding. These two intermediates with a wrong disulfide each are flexible and they can rearrange among others to the intermediate [30-51, 5-55], which has both interior disulfide bonds formed. This is the slowest process in the folding of BPTI. The accessible disulfide bond 14-38 is then formed very rapidly in the final step of folding. It is clear that nativelike intermediates predominate in the folding pathway of BPTI (Weissman and Kim, 1991, 1992), but rearrangements via intermediates with incorrect disulfides are still required to complete folding (Weissman and Kim, 1995). The disulfide-coupled folding pathway of BPTI illustrates beautifully how conformational folding and disulfide bond formation and isomerization are coupled. It also shows that the premature formation of native interactions can lead to traps and that the highly populated intermediates are not the most productive ones. Partially folded intermediates are not common in disulfide-coupled folding. RNase A and RNase Tl hardly show such species in their folding. Their folding pathways have been reviewed (Creighton, 1995).
190
FRANZ X. SCHMID
FOLDING OF TWO-DOMAIN PROTEINS Most of the concepts and the experimental work on protein folding discussed so far refer to small, monomeric, single-domain proteins—typically of about 100 amino acid residues. These proteins represent the basic units of folding and are thus the molecules of choice to study the mechanism of folding itself. Moreover these proteins often unfold and refold reversibly, which are prerequisites for quantitative analyses of the thermodynamics and the kinetics of folding. Many proteins, however, consist of more than one domain or of more than one polypeptide chain. Their folding is of course not complete when the individual domains or protein chains have folded, but these units have to assemble or associate at a certain stage to allow further folding to the final native state. The folding of the individual domains is often rapid, but the docking of domains or the association of subunits can be very slow and thus rate-limiting for the overall folding process. Large proteins require minutes or even hours to reach the native state rather than seconds or milliseconds as do the small proteins. Several problems are thus encountered in the folding of large proteins. Long-lived unassembled domains or subunits can aggregate, and incorrect domain pairing can lead to nonfunctional material. From these considerations, it is clear that the folding mechanisms of large proteins are much more difficult to elucidate than the mechanisms of small proteins, and it is still an art to find conditions under which a particular large protein folds reversibly in vitro. Not only the scientists have to struggle with unproductive folding and aggregation in their experiments with purified proteins. These side reactions pose major problems for folding in the cell also, and different chaperone systems are present to suppress side reactions and support productive folding (Hartl, 1996; Buchner, 1996; Lorimer, 1996). In describing the folding of the large proteins, I shall proceed from "simple" to "complex." I will begin with the meaning of domain and then discuss the folding of monomeric two-domain proteins, the folding and association of a small dimeric protein of single-domain subunits, and finally oligomeric proteins in which the monomers consist already of several domains. An excellent review about the folding of large proteins was written by Garel (1992). The word domain is frequently used in protein science, but its meaning often remains vague. Originally, a domain referred to a stable unit of a protein, which was produced by limited proteolysis, and which could be investigated similarly to an intact protein. Domains as units of structure are often apparent in protein crystal structures. Sometimes they are easily seen by eye; in other instances, computer programs are necessary to identify structural domains (e.g., by an analysis of the local packing density) (Lesk and Rose, 1981; Zehfus and Rose, 1986; Janin and Wodak, 1983). Domains as functional units (such as the nucleotide-binding domains) are often identical with the domains that were identified by limited proteolysis or by X-ray crystallography. A domain can also be a thermodynamic unit of
Protein Folding
191
a protein, which can unfold and refold by itself in a cooperative process (Privalov, 1982). Finally, the term domain is also used in genetics for recurring sequence motifs or for exons. The domain concept is lucidly described by Garel (1992). A folding unit is a part of a protein that can fold by itself to an assembly-competent stage. In this respect, it resembles most closely the thermodynamic definition of a domain. Domains as folding units should thus represent a contiguous section of the protein chain. The strictest criterion for a folding domain is that it be able to fold to a cooperative nativelike structure in isolation. Of course, a conformation exactly like that in the native protein cannot be reached because the subsequent docking with the other domains is necessarily coupled with further folding. As mentioned, independently folding domains are perhaps best identified by calorimetry (Privalov, 1982). Two peaks are found in the melting curve of an intact two-domain protein when thermal unfolding is followed by the change in heat capacity (Figure 10). Three processes occur in the thermal unfolding of a two-domain protein: the
20 • 0
1 20
' 40
1 60
' 80
L__ 100
T/°C Figure 10. Denaturation of a two-domain protein as measured by differential scanning microcalorimetry. The increase in heat capacity upon unfolding of troponin C is shown as a function of temperature (in 10 m M cacodylate buffer, p H 7.25, 10 m M EDTA, 9.5 m M CaCI 2 ). The continuous line shows the experimental data; the dot-dash and the dashed line show the first and the final approximations, respectively, of the baseline for the intrinsic heat capacity of the protein. This figure is reproduced with permission from Privalov and Potekhin (1986).
FRANZ X. SCHMID
192
separation of the two domains, the unfolding of the first domain, and the unfolding of the second domain. For most two-domain proteins, the first peak in the calorimetric curve represents the cooperative loss of the domain-domain interaction and the unfolding of the less-stable domain (Brandts et al., 1989). The second peak represents the unfolding of the more stable domain. As a consequence, the unfolding transition of the less stable domain is often shifted to a lower temperature when this domain is studied in isolation, or it could even be unstable in the absence of the stabilizing domain-domain interactions. The more stable domain should ideally unfold at the same temperature (or denaturant concentration) in the intact protein and in isolation. The Immunoglobulin Light Chain
The immunoglobulin (Ig) light chain consists of two domains of slightly more than 100 residues each, the constant CL and the variable VL domain. These domains are easily identified in the crystal structure and can be produced by limited proteolysis of the intact light chain. The domain-domain interactions are very weak in this case, and the unfolding transition of the intact protein is fairly well modeled by the superposition of the unfolding transitions of the two isolated domains (Figure 11). Also, the CD spectrum of the intact light chain is equal to the sum of the spectra of the domains (Tsunenaga et al., 1987). Together this indicates that the VL and C L fragments represent independently folding units of the Ig light chain. Goto, Hamaguchi, and coworkers (1982a,b) (Tsunenaga et al., 1987) investigated the folding kinetics of these two domains in detail, both in the presence and in the absence of the single intradomain disulfide bond. When folding in isolation, the VL and C L fragments follow patterns that were found for small, single-domain proteins of similar size. The molecules with correct prolyl isomers fold rapidly in the time range of about 100 milliseconds, and the molecules with incorrect prolyl isomers fold slowly in the time range of a few minutes. This slow folding is preceded by the formation of a nativelike folding intermediate. The fast folding reactions show a chevronlike dependence on the denaturant concentration (Figure 12) as found for small single-domain proteins. The refolding of the VL fragment is slower than the folding of CL, and it does not show a measurable fast-folding reaction. VL contains two cis prolyl bonds in the folded state (Tsunenaga et al., 1987). Therefore the fraction of fast-folding molecules with correct (cis) isomers is very small, and the slow refolding is additionally decelerated because two prolyl isomerizations must occur to reach the native state. This is reminiscent of the folding of RNase Tl, a small protein (104 residues) with two cis prolyl bonds (Kiefhaber et al., 1990b; Mayr et al., 1996). The folding of the intact Ig light chain largely reflects the folding kinetics of its individual CL and VL fragments (Figure 12) (Tsunenaga et al., 1987). This is very clear in unfolding. The unfolding of the intact light chain consists of two phases, which correspond to the unfolding reactions of the two fragments. This demon-
Protein Folding
193 TJ
0
1
0
1
2
3
1
QQ5
0 2 CGdn-HCU(M)
3
Figure 11. Equilibrium unfolding transitions of the immunoglobulin light chain Oku (squares in panel a) and its V L fragment (circles in panel b) and C L fragment (triangles in panel b) at pH 7.5 and 25 °C. The ordinate represents the fraction of unfolded protein (f0). The unfolding transitions were measured by the change in protein fluorescence at 350 nm (after excitation at 295 nm). Unfolding was reversible as shown by the coincidence of the values obtained by unfolding from 0 M Gdn-HCI to the indicated concentrations of Gdn-HCI (clear symbols) and of the values obtained by refolding from 4.0 M Gdn-HCI to the indicated concentrations of Gdn-HCI (as shown by the solid symbols). The continuous lines indicate the theoretical curves calculated ior a two-state mechanism. The dotted line in panel a indicates the transition curve of the light chain as calculated from the transition curves of the individual domains in panel b. The figure is reprinted with permission from Tsunenaga et al. (1987). Copyright [1987] American Chemical Society.
strates that the C L and VL domains unfold independently in the intact protein. In the refolding of the intact protein, the kinetics of folding of the domains (as measured in isolation) are clearly visible as individual phases. Furthermore, an extra slow phase is present. This suggests that after the independent folding of the domains, the interactions between the domains are formed, and this slow reaction determines the rate of the final event of folding.
194
FRANZ X. SCHMID 1I
1
1
i
I \ , ^° a a
/ ¥
o
-1
\^ i
0
1
1
1
1
2
....
1
/ 1
/
1 T.,
1
I 1
I
1
3
_„
T.,.
n
o •
4
1—i
— , — ,
| (b)
K i :<\5&p
0 0
1
CL
n
v-
i
o
1
2 3 CGdn-HCU(M)
1 1
1
A
Figure 12. Unfolding and refolding kinetics of the immunoglobulin light chain Oku and its V L and C L fragments. Panel a shows the apparent rate constants, panel b shows the relative amplitudes as a function of the Gdn-HCI concentration at pH 7.5 and 25 °C. (a) (o, • ) Unfolding kinetics obtained by stopped-flow fluorescence measurements; (A) refolding kinetics measured after manual mixing and (A) after stopped-flow mixing by fluorescence, (b) (o) Amplitude of the fast phase of unfolding, (D) amplitude of the fast phase of refolding. The solid lines 1 and 2 in (a) indicate the values of the measured rate constants X2 and kv respectively, of the C L fragment. The dotted line indicates the apparent rate constant of the V L fragment. The solid line in (b) indicates the curve for the amplitude of the fast phase of the intact light chain calculated by assuming that the V L and C L domains unfold and refold independently. The figure is reprinted with permission from Tsunenaga and colleagues (1987). Copyright [1987] American Chemical Society.
Yn Crystallin
The crystallins of the eye lens are remarkable proteins. Unlike other proteins, which are constantly degraded and newly synthesized, the crystallins are stable and kept for the entire life of the organism. Therefore much effort has been devoted to finding out whether they show an unusual thermodynamic stability or unusual folding kinetics. Yn crystallin is the best-studied crystallin. It is a monomeric protein composed of two immunoglobulinlike domains of about equal size and secondary structure. The two domains are arranged "side-by-side" and not "end-on" as in the immunoglobulin chains (Wistow et al., 1983).
195
Protein Folding
The urea-induced equilibrium unfolding of yu crystallin occurs in two stages with midpoints near 2 M and 5 M urea (Figure 13), suggesting that the two domains differ strongly in stability. Experiments with the isolated domains confirmed this. The transition near 5 M urea is also seen for the N-terminal domain alone and could thus be assigned to this domain. The isolated C-terminal domain was partially unfolded even in the absence of urea and unfolded completely when 1 M urea was added (Figure 13). This provides clear evidence that in intact yn crystallin the C-terminal domain is stabilized by the domain-domain interactions and that in the first unfolding transition near 2 M urea (Figure 13), the dissociation of the two domains and the unfolding of the less stable C-terminal domain occur in a concerted process (Rudolph et al., 1990; Mayr et al., 1994a; Jaenicke, 1996b). The folding kinetics of Yn crystallin as a function of the urea concentration are very interesting (Figure 14) (Rudolph et al., 1990). Basically they are composed of two interdigitated chevrons, one with a minimum near 2 M urea and the other with a minimum near 5 M urea. The chevron at 5 M urea represents the unfolding and refolding kinetics of the more stable N-terminal domain. This reaction occurs in molecules in which the C-terminal domain is already unfolded. Identical chevron profiles were obtained for the folding of this domain as part of the intact protein or as an isolated domain, showing that the folding reaction in the N-terminal chain region is not changed when (in the intact protein) a long stretch of unfolded chain (the C-terminal region) is present. The chevron centered at 2 M urea is more difficult H'
8 2 §
I ' I ' I ' I ' i ' » ' i '
r%
-120 @
o»o
CO
0 U.
I
.
I
•
l
• I
• I
• I
H'
I
» I
'
I
'
• I
'
I
F £2 1 I
•
•
.
1.5
i
I
1
•
»i
v ^»«*# • •
I2'5
'-1
I—. I . I . I . I . I . I i 1 i d
[urea] (M) Figure 13. Denaturation of y,, crystallin by urea in 0.1 M NaCI/HCI, pH 2.0, 20 °C Upper Panel: Unfolding transitions of the intact protein, as measured by fluorescence emission (•, left ordinate numbering) and by the sedimentation coefficient in ultracentrifugation (o, right ordinate). Lower panel: Unfolding transitions of the isolated aminoterminal (o) and carboxyterminal (•) domains, as measured by fluorescence. This figure is adapted with permission from Rudolph and colleagues (1990).
196
FRANZ X. SCHMID
[ u r e a ] (M) Figure 14. Dependence of the rate constants of denaturation and renaturation of Yn-crystallin on urea concentration in 0.1 M NaCI/HCI, pH 2.0 at 20 °C. Folding was probed by the fluorescence emission of intact YM-crystallin at 360 nm (#, unfolding; o, refolding) and at 320 nm (•, unfolding; u refolding), by fast protein liquid chromatography gel filtration ( • , unfolding; O refolding), by fluorescence emission of the intermediate (I) at 360 nm (A, unfolding, A , refolding) and at 320 nm (v, unfolding), and by fluorescence emission of the aminoterminal fragment at 360 n m (•, unfolding; o, refolding) and at 320 nm (•). An N ^ I ^ D mechanism was used to explain the data, and the limbs of the individual chevrons are labeled with the processes that are followed in the particular regions. This figure is reproduced with permission from Rudolph and colleagues (1990).
to interpret. This reaction has to involve both the folding of the less stable C-terminal domain and the coalescence with the already folded N-terminal domain. This reaction could occur in two different ways. The C-terminal domain could fold by itself, and the coalescence with the N-terminal domain would simply stabilize the folded structure and thus drive the reaction. Alternatively, the first, alreadyfolded domain could form a scaffold onto which the second domain collapses to find its native conformation. The kinetic data do not allow the discrimination between these two possibilities. The Ig light chain and yu crystallin represent two variations of the same theme. In the Ig light chain, the two domains show negligible mutual interactions and are almost independent of each other in both stability and folding kinetics. In Yu crystallin, only one domain is stable in isolation. The other domain requires a strong domain-domain interaction to remain stably folded in the native protein, and it can reach its native state only when the other domain has already folded.
Protein Folding
197
FOLDING AND ASSOCIATION OF OLIGOMERIC PROTEINS Protein association must be specific, and complementary surfaces are necessary for the mutual recognition of the cognate binding partners. Equally important is that unwanted associations with incorrect partners are avoided. This may not always be easy. The (3 subunits of dimeric luciferase form unproductive (3(3 dimers instead of the native a(3 dimers when a chains are not available (Clark et al., 1993; Baldwin et al., 1993; Ziegler et al., 1993). The folding of oligomeric proteins was reviewed extensively by Jaenicke (1987, 1995, 1996a) (Jaenicke et al., 1979). Association has to be preceded by folding steps in the monomers to generate the surface structures that are necessary for specific association. Folding continues after association to reach the final native conformation of the complex. The rate-limiting event can be identified by measuring the rate of folding as a function of the protein concentration. Monomolecular folding steps are independent of protein concentration whereas associations become faster when the concentration is increased. A folding reaction, which is rate-limited by an association at low concentration (and thus shows second-order kinetics), can turn into a first-order reaction at high concentration when the association becomes faster than a preceding or following monomolecular step. Overall folding of a multisubunit protein is thus a succession of monomolecular and bimolecular steps, all of which are presumably accompanied by conformational rearrangements. The arc repressor of phage P22 is a very small homodimer of two polypeptide chains of only 53 residues each. The native dimer is largely helical and the individual monomers are strongly intertwined. Thus they create a large interactive surface and the folded dimer rather resembles a single folded domain. Unlike most other oligomeric proteins, the arc repressor folds and associates unusually fast, and at a subunit concentration of 16 JIM, its folding is complete in about 100 ms (Milla and Sauer, 1994). This is remarkable because it shows that this dimeric protein approaches the rates of folding of small monomeric proteins, which of course do not have to associate in their folding. It is even more remarkable because its association is not simply a side-by-side arrangement of the monomers but a complex intertwining of the subunits, which certainly requires a close coupling between conformational folding and association. The rate of folding increases with the protein concentration, and the bimolecular rate constant is as high as 107M~] s"1. This is only one or two orders of magnitude lower than the diffusion limit. At high protein concentrations (above 50 jiM), the dependence on concentration of the folding rate becomes weaker and the rate-limiting event of folding gradually shifts to a monomolecular process with a time constant of about 5 ms (Milla and Sauer, 1994; Milla et al.. 1995). This process reflects probably the final folding events in the already-dimerized molecules. By their very nature, refolding experiments are performed under native conditions and unfolding experiments under unfolding conditions (such as at high concentrations of a denaturant). The rate of unfolding under native conditions can
198
FRANZ X. SCHMID
normally not be measured simply because the native protein is stable and does not unfold. Nevertheless, the rate of unfolding under native conditions is an important property because it reports on the conformational dynamics of a folded protein. Usually this information can be obtained only from a long extrapolation of the unfolding data as measured at high denaturant concentration to 0 M denaturant. Sauer and coworkers (Jonsson et al., 1996) developed a method to measure the rate constant for the dissociation and unfolding of the arc repressor dimer under conditions where the folded protein is stable. To this end, they prepared two kinds of arc repressor dimers, one labeled with a fluorescence donor and the other with a fluorescence acceptor group. These two populations were mixed at time zero and the development with time of the energy transfer between both groups was followed. Energy transfer can occur only in heterodimers with the donor and the acceptor in the same molecule. Under the strongly native conditions of the experiments, the rate of formation of these heterodimers is not determined by their association (which is very fast under native conditions) but by the rate of the prior unfolding/dissociation of the initially homodimeric molecules. Thus the rate of dissociation/unfolding could be measured under native conditions. It indicates that the arc repressor is indeed a dynamic molecule that unfolds at a rate of about 0.1 s"1 even in the absence of a denaturant. Each subunit of the arc repressor contains a buried salt bridge triad composed of Arg31, Glu36, and Arg40. The replacement of these three residues by three hydrophobic amino acids—Met, Tyr, and Leu, respectively, in the MYL variant— stabilized the dimeric protein by 16.3 kJ/mol (Waldburger et al., 1996; Hendsch et al., 1996). Depending on the conditions, these mutations accelerated the refolding/association reaction 10- to 1250-fold, and at 0 M urea, an association rate constant of 310 8 M _1 s _1 was reached. From this strong rate enhancement, Waldburger and colleagues (1996) conclude that the difficult formation of the buried polar interactions of Arg31, Glu36, and Arg40 in the wild-type protein contribute to the energetic barrier in folding, which occurs late in the wild-type protein, after dimerization. In the MYL mutant, this step is optimized and the barrier is shifted to an earlier stage of folding. The results for the arc repressor show clearly that oligomeric proteins do not necessarily fold slowly. Rather, this small dimer folds/associates as fast as monomeric proteins of a similar size. The Trp repressor of E. coli is also a small homodimeric protein of highly intertwined helical monomers of 107 residues each. The monomers form secondary structure within the deadtime of stopped-flow CD experiments, but the native dimers are formed in a complex folding/association process that consists of at least three phases and takes about 1 min at a protein concentration of 10 uM (Mann and Matthews, 1993; Mann et al., 1995). A comparison of the data for the arc repressor and the Trp repressor indicates that, as for the small monomeric proteins, there seems to be a strong correlation between the size of a protein and the rate of its folding.
Protein Folding
199
FOLDING OF LARGE MONOMERIC AND OLIGOMERIC PROTEINS The most dramatic difference between small and large proteins, be they monomelic or oligomeric, lies in the rates of folding. Whereas the small proteins often fold in the time range of milliseconds or seconds (unless they are retarded by slow events such as prolyl isomerizations or the formation of disulfide bonds), the large proteins often require minutes, hours, or even days to complete folding. As a consequence of these very slow processes, nonproductive side reactions often compete with correct folding and the yields of native protein can be very low (Garel, 1992; Jaenicke, 1987; Yem et al., 1992). Large Monomeric Proteins
The kinetics of folding of the large monomeric proteins are generally complex. Probes that are sensitive to the formation of secondary and tertiary structure (e.g., the amide CD or tryptophan fluorescence) or to the formation of a collapsed structure (e.g., the binding of the hydrophobic dye ANS) often reveal rapid changes in the time range of less than a second. They monitor gross folding, presumably of the individual domains early in refolding. The intermediates formed in these early reactions are fairly unstable and easily degraded by proteases. The native state with its functional properties (such as enzymatic activity) and the resistance to proteolytic cleavage are regained much more slowly, often in the time range of minutes or even hours. Octopine Dehydrogenase (ODH) from Pecten jacobaeus
A fast conformational folding followed by slow reactivation is clearly seen in the refolding of octopine dehydrogenase (ODH) (Teschner et al., 1987). ODH is a monomeric enzyme with a molecular weight of 45.000 Da. It is a NAD-dependent dehydrogenase and catalyzes the reductive condensation of arginine and pyruvate to octopine. The kinetics of refolding and reactivation of ODH are shown in Figure 15. The ellipticity at 222 nm and the fluorescence at 330 nm are regained within the time required for manual mixing (about 15 s), which indicates that a nativelike conformation is reached very rapidly in this folding reaction. The product of this reaction is, however, not enzymatically active. The activity returns in a much slower process, which shows a half time of about 20 min. Interestingly, this final process was retarded when the viscosity of the solvent was increased by adding 30% glycerol. From this result, Teschner and colleagues (1987) concluded that the slow event in the folding of ODH involves the correct pairing of the prefolded domains. It is slow and viscosity-dependent because the domains diffuse together in the rate-limiting step. Similar viscosity-dependent domain-pairing reactions were found in the folding of the a subunit of tryptophan synthase (Chrunyk and
200
FRANZ X. S C H M I D 1
100
p»—n
D
~fr/ on
c QJ
o
50
c
a 25 0
0
20
40
60
60
24 h
Time (min) Figure 15. Refolding and reactivation of octopine dehydrogenase (ODH) after 1 h denaturation in 6.0 M GdmCI at 20 °C as determined by protein fluorescence at 330 nm (D), relative ellipticity at 222 nm (v), and the regain of activity (o). The respective data of the renatured protein were taken as 100%. Refolding was performed at 2 0 °C in 0.1 M sodium phosphate, 1 m M dithioerythritol, and 0.1 M residual GdmCI at p H 7.6, at a protein concentration of 4.5 ug/ml. The figure is reprinted with permission from Teschner and colleagues (1987). Copyright [1987] American Chemical Society.
Matthews, 1990; Hurle et al., 1987) and in aspartokinase-homoserine dehydrogenase (Vaucheret et al., 1987). The division of the folding of a large protein into two processes only, namely the fast folding of the individual domains and the slow coalescence of these domains (as suggested by the data on ODH in Figure 15), provides a useful framework for understanding such folding reactions, but it is clearly an oversimplification. Even in the case of ODH, a third, intermediate phase becomes apparent when folding is monitored by protease resistance (Teschner et al., 1987). This property of the native protein is regained in a process that is about fourfold faster than reactivation. Folding
of Other
Large
Proteins
Folding experiments with the two independent subunits of tryptophan synthase (TSct and TS(3) illustrate the complexity of the folding reactions of large proteins. TSa and TSP have been investigated in the laboratories of C. R. Matthews and M. E. Goldberg, respectively, in great detail by using many different probes including conformation-specific antibodies (Hurle et al., 1987; Saab-Rincon et al., 1996; Tsoji et al., 1993; Chaffotte et al., 1991; Blond-Elguindi and Goldberg, 1990; Tokatlidis et al., 1995; Planchenault et al., 1996). For both proteins, multiple folding steps could be detected that span a time range from milliseconds to several minutes. Although there is a clear trend that the duration and the complexity of folding increase with the size of the protein, this is not always the case. Several large
Protein Folding
201
proteins, such as the mannitol-1 -phosphate dehydrogenase from E. coli (Garel, 1992) and the aldolase from S. aureus (R. Rudolph, personal communication) regain their native, catalytically active conformation within a few seconds after refolding is initiated. The prolyl cis/trans isomerizations, which are the major causes for slow refolding reactions of single-domain proteins, are usually not apparent in the folding of large proteins. Normally they are much faster than the domain-pairing reactions and thus not rate-limiting for overall folding. The Folding Mechanism of Large Monomeric
Proteins: Speculations
Despite the complexity and diversity of the folding reactions of large proteins, some generalizing conclusions can be drawn. It is plausible (and backed by experimental evidence) that domain-folding and coalescence are the major steps in the folding of a large protein. Two problems are apparent in such a sequential process: (i) the domains might not be stable enough to allow their independent folding in isolation, and (ii) the surfaces for the interactions with the other domains must already be present and exposed on the prefolded domains to allow the subsequent correct pairing of the domains. These contact areas become part of the protein interior and are thus less hydrophilic than the protein surface. This is a problem and may lead to increased nonspecific aggregation in the prefolded but unassembled domains. Individual domains can fold almost to completion and remain soluble when they are stable in isolation and when the interaction energies and interaction surfaces between the domains in the native protein are small. For such proteins, the domain organization is often recognized easily by eye in the crystal structure. The immunoglobulin chains belong to this group of proteins. In globular proteins, such as the dehydrogenases, the individual domains are not so easily recognized because they show large contact areas in the folded protein. Such domains cannot fold easily in the absence of the other domains, and if they do, they have extended hydrophobic regions exposed at their surface. There are several possibilities to overcome this problem and minimize the exposure of aggregation-sensitive surfaces. The domains could first fold to an alternatively structured form in which the interaction surfaces are masked and this form would exist in equilibrium with a less stable but "open" conformation that can pair rapidly with another domain. This would minimize the risk of unwanted aggregation. It would, however, also reduce the rate of folding because the concentration of the open, assembly-competent form is small. Nevertheless, such a requirement for an opening of the prefolded domains would nicely explain why domain pairing often is very slow, although the folding domains are covalently linked and are thus very close to each other. Alternatively, the folding domains might expose only a rudimentary interaction surface which is much smaller and much less complementary than in the native protein but still sufficient for an initial docking of the domains.
FRANZ X. SCHMID
202
In this case, domain pairing could occur early in folding but must be followed by major conformational rearrangements. A third possibility is that, as in stability, there is a hierarchy in the kinetics of folding of the individual domains of a protein. One domain (probably the most stable one) folds first and serves as a scaffold for the folding of the other domains. Such a mechanism is probably used by Yn-crystallin. At the ribosome, a protein is synthesized from the amino to the carboxyterminus and therefore the aminoterminal domain should have this function as the scaffold for further folding. In the folding of Yn-crystallin, it is indeed the aminoterminal domain that is more stable and folds first (Rudolph et al., 1990). Large Oligomeric Proteins
The folding and association of large oligomeric proteins is conceptually similar to the folding of the large monomeric proteins because domain-pairing and subunit association reactions are related processes. Subunit association is, however, a bimolecular process and thus depends on protein concentration. As in domainpairing reactions, subunits must first fold to an association-competent form, associate, and then complete folding by further rearrangements. The association reaction itself or the monomolecular rearrangements before and after association could be the slowest step of overall folding. Association is often rate-limiting at low, but not at high, protein concentration. The folding of oligomeric proteins has been the subject of several excellent in-depth reviews and a number of representative well-studied systems are described in these reviews (Garel, 1992; Jaenicke, 1987, 1996a; Jaenickeetal., 1979;Seckler and Jaenicke, 1992). Here I will use the tail spike protein from the Salmonella phage P22 as a model protein to discuss basic aspects of the folding of large oligomeric proteins. The P22 tail spike protein is a homotrimer, which consists of three protein chains of 72 kDa. Its crystal structure is known. The monomers of the tailspike protein form long parallel (3 helices. They expose rather flat interaction surfaces and associate side-by-side in the trimer (Steinbacher et al., 1994). In the carboxyterminal chain regions, the monomers are strongly intertwined. The P22 bacteriophage uses the tailspike protein for attachment to the surface of Salmonella cells. The folding and assembly pathway of the tailspike protein has been studied very well both in vivo and in vitro, and it is clear now that the competition between productive folding/association to the native trimer and unspecific aggregation to an insoluble form is the major factor in the folding of this protein (Goldenberg and King, 1982; Goldenberg et al., 1983; Seckler and Jaenicke, 1992; Beissinger et al., 1995). When fully assembled, the trimeric protein is highly resistant to thermally or detergent-induced unfolding, but a monomeric and a trimeric intermediate (called the protrimer) on the folding pathway are only marginally stable. The association of the monomeric folding intermediates to the protrimer is fast, but the conversion
Protein Folding
203
of the protrimer to the native trimer is slow and thus determines the rate of overall folding in vitro as well as in vivo. This conversion occurs with similar rates in vitro and in vivo. The low rate of the protrimer —> trimer conversion, the marginal stability of the protrimer, and its high tendency to aggregate are the key elements to understand the folding mechanism of the tail spike protein. They explain why its folding becomes temperature-sensitive near 40 °C, although the native trimer is stable at temperatures higher than 80 °C. Above 40 °C, the protrimer becomes increasingly unstable and its productive folding to the native trimer can no longer compete efficiently with unspecific aggregation. Many mutations decrease the folding yield because the protrimer is so unstable. They have been termed "temperature sensitive for folding," or te/mutations. The fo/phenotype could be suppressed by second-site mutations (su), which reverted the ta/phenotype (Fane et al., 1991; Mitraki et al., 1991, King et al., 1996). When introduced into the wild-type protein, the su mutations increased the folding yield at elevated temperature. It was originally thought that the tsf and su mutations affected only the pathway of folding by changing the stability of the protrimer folding intermediate, but not of the final, native trimer (Sturtevant et al., 1989). A careful examination of the native proteins, however, revealed that the ta/and su mutations affected the stability of the native protein in a manner that parallels their effects on the folding process (Beissinger et al., 1995). The assembly pathway of the tail spike protein can be well understood on the basis of its three-dimensional structure. The extended parallel (3 helix in the monomer forms rapidly and thus the monomeric folding intermediate is generated. The (3 helix domains of three protein chains then associate to form the protrimer. In this trimeric intermediate, the carboxyterminal regions are presumably still unfolded. Therefore the protrimer is only marginally stable and sensitive to aggregation. The folding and interdigitation of the carboxyterminal regions is the slow and rate-limiting event in the conversion of the protrimer to the native trimer. In this reaction, hydrophobic surface is buried and the additional interactions are established. Thus the trimers gain strongly in stability and the tendency to aggregate is diminished.
CONCLUDING REMARKS In the past years, much progress has been achieved in the investigations of folding reactions: the structures of some folding intermediates are known at fairly high resolution, and the properties of critical activated states begin to emerge. Nevertheless, the protein-folding problem is far from being solved. This is not surprising, regarding the complexity of folded proteins. It seems that an ultimate general solution to this problem cannot be found simply because proteins are so diverse and certainly use different strategies in folding. This is now clear for small and large proteins. The small proteins usually do not form populated intermediates, and thus
FRANZ X.SCHMID
204
they can fold much faster than they are synthesized. Large proteins with several domains fold by parts, and it is the assembly of the prefolded domains that makes their folding slow and prone to side reactions such as aggregation, both in vitro and in vivo. Folding a large protein is not only difficult for the investigator, it is also difficult for the cells—not because the sequence information for the folded state is insufficient, but because it is difficult to guard the folding protein chain along a very shallow gradient of free energy safely to the native conformation. Here chaperones and folding enzymes help to avoid aggregation, shorten the time a protein spends in partially folded states, and rescue proteins from trapped nonnative conformations.
REFERENCES Alexandrescu, A.T., Evans, P.A., Pitkeathly, M., Baum, J., and Dobson, CM. (1993). Structure and Dynamics of the Acid-Denatured Molten Globule State of a-Lactalbumin—A 2-Dimensional NMR Study. Biochemistry 32, 1707-1718. Arcus, V.L., Vuilleumier, S., Freund, S.M.V., Bycroft, M., and Fersht, A.R. (1995). A comparison of the pH, urea, and temperature-denatured states of barnase by heteronuclear NMR: Implications for the initiation of protein folding. J. Mol. Biol. 254, 305-321. Balbach, J., Forge, V, Lau, W.S., Vannuland, N. A.J., Brew, K., and Dobson, CM. (1996). Protein folding monitored at individual residues during a two-dimensional NMR experiment. Science 274, 1161-1163. Balbach, J., Forge, V, Lau, W.S., Jones, J. A., van Nuland, N. A. J., and Dobson, CM. (1997). Detection of residue contacts in a protein folding intermediate. Proc. Natl. Acad. Sci. U.S.A. 94,7182-7185. Baldwin, R.L. (1993). Pulsed H/D-exchange studies of folding intermediates. Curr. Opin. Struct. Biol. 3,84-91. Baldwin, R.L. (1996). On-pathway versus off-pathway folding intermediates. Folding and Design 1, R1-R8. Baldwin, T.O., Ziegler, M.M., Chaffotte, A.F., and Goldberg, M.E. (1993). Contribution of folding steps involving the individual subunits of bacterial luciferase to the assembly of the active heterodimeric enzyme. J. Biol. Chem. 268, 10766-10772. Ballew, R.M., Sabelko, J., and Gruebele, M. (1996a). Direct observation of fast protein folding: The initial collapse of apomyoglobin. Proc. Natl. Acad. Sci. USA 93, 5759-5764. Ballew, R.M., Sabelko, J., and Gruebele, M. (1996b). Observation of distinct nanosecond and microsecond protein folding events. Nature Struct. Biol. 3, 923-926. Bardwell, J.C.A. and Beckwith, J. (1993). The Bonds that Tie—Catalyzed Disulfide Bond Formation. Cell 74, 769-771. Barrick, D. and Baldwin, R.L. (1993a). The Molten Globule Intermediate of Apomyoglobin and the Process of Protein Folding. Protein Sci. 2, 869-876. Barrick, D. and Baldwin, R.L. (1993b). Three-State Analysis of Sperm Whale Apomyoglobin Folding. Biochemistry 32, 3790-3796. Beissinger, M , Lee, S.C, Steinbacher, S., Reinemer, P., Huber, R., Yu, M.H., and Seckler, R. (1995). Mutations that stabilize folding intermediates of phage P22 tailspike protein: Folding in vivo and in vitro, stability, and structural context. J. Mol. Biol. 249, 185-194. Blond-Elguindi, S. and Goldberg, M.E. (1990). Conformational Change in the N-Terminal Domain of the Escherichia-Coli Tryptophan Synthase 02 Subunit Induced by Its Interactions with Monoclonal Antibodies. Res. Immunol. 141, 879-892.
Protein Folding
205
Brandts, J.F, Halvorson, H.R., and Brennan, M. (1975). Consideration of the possibility that the slow step in protein denaturation reactions is due to cis-trans isomerism of proline residues. Biochemistry 14, 4953-4963. Brandts, J.F., Hu, C.Q., and Lin, L.-N. (1989). A simple model for proteins with interacting domains. Application to scanning calorimetry data. Biochemistry 28, 8588-8596. Bryngelson, J.D., Onuchic, J.N., Socci, N.D., and Wolynes, P.G. (1995). Funnels, pathways, and the energy landscape of protein folding: A synthesis. Protein-Struct. Funct. Genet. 21, 167-195. Buchner, J. (1996). Supervising the fold: Functional principles of molecular chaperones. FASEB J. 10, 10-19. Buchner, J., Schmidt, M., Fuchs, M., Jaenicke, R., Rudolph, R., Schmid, F.X., and Kiefhaber, T. (1991). GroE Facilitates Refolding of Citrate Synthase by Suppressing Aggregation. Biochemistry 30, 1586-1591. Buck, M., Radford, S.E., and Dobson, CM. (1993). A Partially Folded State of Hen Egg White Lysozyme in Trifluoroethanol-Structural Characterization and Implications for Protein Folding. Biochemistry 32, 669-678. Burton, R.E., Huang, G.S., Daugherty, M.A., Fullbright, P.W., and Oas, T.G. (1996). Microsecond protein folding through a compact transition state. J. Mol. Biol. 311-322. Carra, J.H. and Privalov, PL. (1995). Energetics of Denaturation and m Values of Staphylococcal Nuclease Mutants. Biochemistry 34, 2034-2041. Chaffotte, A., Guillou, Y, Delepierre, M., Hinz, H.J., and Goldberg, M.E. (1991). The Isolated C-Terminal (F2) Fragment of the Escherichia-Coli Tryptophan Synthase p-2-Subunit Folds into a Stable, Organized Nonnative Conformation. Biochemistry 30, 8067-8074. Chan, C.-K., Hofrichter, J., and Eaton, W.A. (1996). Optical triggers of protein folding. Science 274, 628-629. Chan, C.K., Hu, Y, Takahashi, S., Rousseau, D.L., Eaton, W.A., and Hofrichter, J. (1997). Submillisecond protein folding kinetics studied by ultrarapid mixing. Proc. Nat. Acad. Sci. USA 94, 1779-1784. Chazin, W.J., Kordel, J., Drakenberg, T, Thulin, E., Brodin, P., Grundstrom, T, and Fors£n, S. (1989). Proline isomerism leads to multiple folded conformations of calbindin D9k: Direct evidence from two-dimensional NMR spectroscopy. Proc. Natl. Acad. Sci. USA 86, 2195-2198. Chen, B.-L., Baase, W.A., and Schellman, J. A. (1989). Low-temperature unfolding of a mutant of phage T4 lysozyme. 2. Kinetic investigations. Bichemistry 28, 691-699. Chen, B.L., Baase, W.A., Nicholson, H., and Schellman, J.A. (1992). Folding Kinetics of T4 Lysozyme and Nine Mutants at 12-Degrees-C. Biochemistry 31, 1464-1476. Cheng, H.N. and Bovey, FA. (1977). Cis-trans equilibrium and kinetic studies of acteyl-L-proline and glycyl-L-proline. Biopolymers 16, 1465-1472. Chrunyk, B.A. and Matthews, C.R. (1990). Role of diffusion in the folding of the asubunit of tryptophan synthase from Escherichia coli. Biochemistry 29, 2149-2154. Clark, A.C., Sinclair, J.F, and Baldwin, TO. (1993). Folding of bacterial luciferase involves a non-native heterodimeric intermediate in equilibrium with the native enzyme and the unfolded subunits. J. Biol. Chem. 268, 10773-10779. Cook, K.H., Schmid, F.X., and Baldwin, R.L. (1979). Role of proline isomerization in folding of ribonuclease A at low temperatures. Proc. Natl. Acad. Sci. U. S. A. 76, 6157-6161. Creighton, T.E. (1978). Experimental studies of protein folding and unfolding. Prog. Biophys. Mol. Biol. 33,231-297. Creighton, T.E. (1985). The problem of how and why proteins adopt folded conformations. J. Phys. Chem. 89, 2452-2459. Creighton, T.E. (1986). Disulfide bonds as probes of protein folding pathways. Methods Enzymol. 131, 83-106. Creighton, T.E. (1990). Protein folding. Biochem. J. 270, 1-16. Creighton, T.E. (Ed.)( 1992a). Protein Folding. W.H.Freeman, New York.
206
FRANZ X. SCHMID
Creighton, T.E. (1992b). Folding pathways determined using disulfide bonds. In: Protein Folding. (Creighton, T.E., Ed.), pp. 301-351. W.H.Freeman, New York. Creighton, T.E. (1994). The energetic ups and downs of protein folding. Nature Struct. Biol. 1,135-138. Creighton, T.E. (1995). Disulphide-coupled protein folding pathways. Phil. Trans. R. Soc. London B 348,5-10. Creighton, T.E., Darby, N.J., and Kemmink, J. (1996). The roles of partly folded intermediates in protein folding. FASEB J. 10, 110-118. Darby, N.J. and Creighton, T.E. (1993). Dissecting the disulfide-coupled folding pathway of bovine pancreatic trypsin inhibitor. J. Mol. Biol. 232, 873-896. Darby, N.J. and Creighton, T.E. (1995). Catalytic mechanism of DsbA and its comparison with that of protein disulfide isomerase. Biochemistry 34, 3576-3587. Dobson, CM. (1991). NMR Spectroscopy and Protein Folding-Studies of Lysozyme and a-Lactalbumin. Protein Conformation. 161:167-189, 167-189. Dobson, CM. (1995). Finding the right fold. Nature Struct. Biol. 2, 513-517. Eaton, W.A., Thompson, P.A., Chan, C.K., Hagen, S.J., and Hofrichter, J. (1996). Fast events in protein folding. Structure 4, 1133-1139. Eaton, W.A., Munoz, V., Thompson, P.A., Chan, C.-K., and Hofrichter, J. (1997). Submillisecond kinetics of protein folding. Curr. Opin. Struct. Biol. 7, 10-14. Evans, P.A., Dobson, CM., Kautz, R.A., Hatfull, G., and Fox, R.O. (1987). Proline isomerism in staphylococcal nuclease characterized by NMR and site-directed mutagenesis. Nature 329, 266-268. Evans, PA., Topping, K.D., Woolfson, D.N., and Dobson, CM. (1991). Hydrophobic Clustering in Nonnative States of a Protein-Interpretation of Chemical Shifts in NMR Spectra of Denatured States of Lysozyme. Proteins: Struct. Funct. Genet. 9, 248-266. Ewbank, J.J. and Creighton, T.E. (1991). The molten globule protein conformation probed by disulphide bonds. Nature 350, 518-520. Ewbank, J.J., Creighton, T.E., Hayer-Hartl, M.K., and Hartl, F.U. (1995). What is the molten globule? Nature Struct. Biol. 2, 10 Fane, B., Villafane, R., Mitraki, A., and King, J. (1991). Identification of Global Suppressors for Temperature-Sensitive Folding Mutations of the P22 Tailspike Protein. J. Biol. Chem. 266, 11640-11648. Fenton, W.A., and Horwich, A.L. (1997) GroEL-mediated protein folding. Protein Sci 6, 743-760. Fersht, A.R. (1993). Protein Folding and Stability—The Pathway of Folding of Barnase. FEBS Letters 325,5-16. Fersht, A.R. (1995a). Characterizing transition states in protein folding: An essential step in the puzzle. Curr. Opin. Struct. Biol. 5, 79-84. Fersht, A.R. (1995b). Mapping the structures of transition states and intermediates in folding: Delineation of pathways at high resolution. Philos. Trans. R. Soc. Lond. [Biol]. 348, 11-15. Fersht, A.R. and Serrano, L. (1993). Principles of protein stability derived from protein engineering experiments. Curr. Opin. Struct. Biol. 3, 75-83. Fersht, A.R., Bycroft, M., Horovitz, A., Kellis, J.T., Matouschek, A., and Serrano, L. (1991). Pathway and Stability of Protein Folding. Philos. Trans. R. Soc. Lond. [B]. 332, 171-176. Fersht, A.R., Matouschek, A., and Serrano, L. (1992). The Folding of an Enzyme. 1. Theory of Protein Engineering Analysis of Stability and Pathway of Protein Folding. J. Mol. Biol. 224, 771-782. Fischer, G. (1994). Peptidyl-prolyl cis/trans isomerases and their effectors. Angew. Chem. Int. Ed. 33, 1415-1436. Fischer, G., Bang, H., and Mech, C (1984). Nachweis einer Enzymkatalyse fur die cis-trans-\somerisierung der Peptidbindung inprolinhaltigen Peptiden. Biomed. Biochim. Acta43, 1101-1 111. Fischer, G., Wittmann-Liebold, B., Lang, K., Kiefhaber, T, and Schmid, F.X. (1989). Cyclophilin and peptidyl-prolyl-aV/ranj-isomerase are probably identical proteins. Nature 337, 476-478.
Protein Folding
207
Franke, E.K., Yuan, H.E.H., and Luban, J. (1994). Specific incorporation of cyclophilin A into HIV-1 virions. Nature 372, 359-362. Freedman, R.B. (1992). Protein folding in the cell. In: Protein Folding. (Creighton, T.E., Ed.), pp. 455-539. W.H. Freeman, New York. Freedman, R.B., Hirst, T.R., and Tuite, M.F. (1994). Protein disulphide isomerase: Building bridges in protein folding. TIBS 19, 331-336. Frydman, J. and Hartl, F.U. (1996). Principles of chaperone-assisted protein folding: Differences between in vitro and in vivo mechanisms. Science 272, 1497-1502. Galat, A. and Metcalfe, S.M. (1995). Peptidylproline cis/trans isomerases. Prog. Biophys. molec. Biol. 63,67-118. Garel, J.-R. (1992). Folding of large proteins: Multidomain and multisubunit proteins. In: Protein Folding. (Creighton, T.E., Ed.), pp. 405-454. W.H. Freeman, New York. Garel, J.R. and Baldwin, R.L. (1973). Both the fast and slow refolding reactions ofribonucleaseA yield native enzyme. Proc. Natl. Acad. Sci. U. S. A. 70, 3347-3351. Garvey, E.P. and Matthews, C.R. (1989). Effects of multiple replacements at a single position on the folding and stability of dihydrofolate reductase from E. coli. Biochemistry 28, 2083-2093. Go, N. (1983). Theoretical studies of protein folding. Annu. Rev. Biophys. Bioeng. 12, 183-210. Goldberg, M.E. (1985). The second translation of the genetic message: Protein folding and assembly. TIBS 10 388-391. Goldenberg, D.P. (1992). Native and non-native intermediates in the BPTI folding pathway. TIBS 17, 257-261. Goldenberg, D.P. and King, J. (1982). Trimeric intermediate in the in vivo folding and subunit assembly of the tail spike endorhamnosidase of bacteriophage P22. Proc. Nat. Acad. Sci. USA 79, 3403-3407. Goldenberg, D.P, Smith, D.H., and King, J. (1983). Genetic analysis of the folding pathway for the tail spike protein of phage P22. Proc. Nat. Acad. Sci. USA 80, 7060-7064. Goloubinoff, P., Christeller, J.T., Gatenby, A.A., and Lorimer, G.L. (1989). Reconstitution of active dimeric ribulose bisphosphate carboxylase from an unfolded state depends on two chaperonin proteins and Mg-ATP. Nature 342, 884-889. Goto, Y and Hamaguchi, K. (1982a). Unfolding and refolding of the reduced constant fragment of the immunoglobulin light chain. Kinetic role of the disulfide bond. J. Mol. Biol. 156, 911-926. Goto, Y and Hamaguchi, K. (1982b). Unfolding and refolding of the constant fragment of the immunoglobulin light chain. J. Mol. Biol. 156, 891-910. Grathwohl, C. and Wuthrich, K. (1981). NMR studies of the rates of proline cis-trans isomerization in oligopeptides. Biopolymers 20, 2623-2633. Haezebrouck, P., Joniau, M., Vandael, H., Hooke, S.D., Woodruff, N.D., and Dobson, CM. (1995). An equilibrium partially folded state of human lysozyme at low pH. J. Mol. Biol. 246, 382-387. Hagerman, P.J. and Baldwin, R.L. (1976). A quantitative treatment of the kinetics of the folding transition ofribonucleaseA. Biochemistry 15, 1462-1473. Handschumacher, R.E., Harding, M.W., Rice, J., and Drugge, R.J. (1984). Cyclophilin: A specific cytosolic binding protein for cyclosporin A. Science 226, 544-547. Harrison, S.C. and Durbin, R. (1985). Is there a single pathway for the folding of a polypeptide chain? Proc. Nat. Acad. Sci., USA 82, 4028-4030. Hartl, F.U. (1996). Molecular chaperones in cellular protein folding. Nature 381, 571-580. Hendsch, Z.S., Jonsson, T., Sauer, R.T., and Tidor, B. (1996). Protein stabilization by removal of unsatisfied polar groups: Computational approaches and experimental tests. Biochemistry 35, 7621-7625. Herzberg, O. and Moult, J. (1991). Analysis of the steric strain in the polypeptide backbone of protein molecules. Proteins: Struct. Funct. Genet. 11, 223-229. Hirs, C.H.W. and Timasheff, S.N. (1986). Methods in Enzymology, Vol. 131. Academic Press, Orlando, FL.
208
FRANZ X.SCHMID
Houry, W.A., Rothwarf, D.M., and Scheraga, H.A. (1994). A very fast phase in the refolding of disulfide-intact ribonuclease A: implications for the refolding and unfolding pathways. Biochemistry 33, 2516-2530. Houry, W.A., Rothwarf, D.M., and Scheraga, H.A. (1995). The nature of the initial step in the conformational folding of disulphide-intactribonucleaseA. Nature Struct. Biol. 2, 495-503. Huang, G.S. and Oas, T.G. (1995). Submillisecond folding of monomeric X repressor. Proc. Natl. Acad. Sci. U. S. A. 92, 6878-6882. Hughson, F.M., Wright, RE., and Baldwin, R.L. (1990). Structural characterization of a partly folded apomyoglobin intermediate. Science 249, 1544-1548. Hughson, KM., Barrick, D., and Baldwin, R.L. (1991). Probing the Stability of a Partly Folded Apomyoglobin Intermediate by Site-Directed Mutagenesis. Biochemistry 30, 4113-4118. Hurle, M.R., Michelotti, G.A., Crisanti, M.M., and Matthews, C.R. (1987). Characterization of a slow folding reaction for the a subunit of tryptophan synthase. Proteins: Struct. Funct. Genet. 2,54-63. Hurle, M.R., Anderson, S., and Kuntz, I.D. (1991). Confirmation of the predicted source of a slow folding reaction: proline 8 of bovine pancreatic trypsin inhibitor. Protein Engineering 4, 451-455. Ikai, A. andTanford, C. (1973). Kinetics of unfolding and refolding of proteins. I. Mathematical analysis. J. Mol. Biol. 73, 145-163. Ikai, A., Fish, W.W., and Tanford, C. (1973). Kinetics of unfolding and refolding of proteins. II. Results for cytochrome c. J. Mol. Biol. 73, 165-184. Itzhaki, L.S., Otzen, D.E., and Fersht, A.R. (1995). The structure of the transition state for folding of chymotrypsin inhibitor 2 analysed by protein engineering methods: Evidence for a nucleationcondensation mechanism for protein folding. J. Mol. Biol. 254, 260-288. Jackson, S.E. and Fersht, A.R. (1991a). Folding of Chymotrypsin Inhibitor-2 .2. Influence of Proline Isomerization on the Folding Kinetics and Thermodynamic Characterization of the Transition State of Folding. Biochemistry 30, 10436-10443. Jackson, S.E. and Fersht, A.R. (1991b). Folding of chymotrypsin inhibitor 2. 1. Evidence for a two-state transition. Biochemistry 30, 10428-10435. Jaenicke, R. (1987). Folding and association of proteins. Prog. Biophys. molec. Biol. 49, 117-237. Jaenicke, R. (1995). Folding and association versus misfolding and aggregation of proteins. Philos. Trans. R. Soc. Lond. [Biol]. 348, 97-105. Jaenicke, R. (1996a). Protein folding and association: In vitro studies for self-organization and targeting in the cell. Curr. Topics Cell. Regul. 34, 209-314. Jaenicke, R. (1996b). Stability and folding of ultrastable proteins: Eye lens crystallins and enzymes from thermophiles. FASEB J. 10, 84-92. Jaenicke, R., Rudolph, R., and Heider, I. (1979). Quaternary structure, subunit activity, and in-vitro association of porcine mitochondrial malic dehydrogenase. Biochemistry 18, 1217-1223. Janin, J. and Wodak, S.J. (1983). Structural domains in proteins and their role in the dynamics of protein function. Prog. Biophys. mol. Biol. 42, 21-78. Jennings, P. A. and Wright, P.E. (1993). Formation of a Molten Globule Intermediate Early in the Kinetic Folding Pathway of Apomyoglobin. Science 262, 892-896. Jennings, P.A., Saalau-Bethell, S.M., Finn, B.E., Chen, X., and Matthews, C.R. (1991). Mutational analysis of protein folding mechanisms. Meth. Enzymol. 202, 113-126. Jones, P.G., van Bogelen, R.A., and Neidhardt, F.C. (1987). Induction of proteins in response to low temperature in Escherichia coli. J. Bacterid. 169, 2092-2095. Jonsson, T., Waldburger, CD., and Sauer, R.T. (1996). Nonlinear free energy relationships in arc repressor unfolding imply the existence of unstable, nativelike folding intermediates. Biochemistry 35, 4795-4802. Jorgensen, W.L. and Gao, J. (1988). Cis-trans energy difference for the peptide bond in the gas phase and in aqueous solution. J. Am. Chem. Soc. 4212-4216. Kato, S., Shimamoto, N., and Utiyama, H. (1982). Identification and characterization of the direct folding process of hen egg-white lysozyme. Biochemistry 21, 38-43.
Protein Folding
209
Kay, M.S. and Baldwin, R.L. (1996). Packing interactions in the apomyglobin folding intermediate. Nature Struct. Biol. 3, 439-445. Kelley, R.F. and Richards, F.M. (1987). Replacement of proline-76 with alanine eliminates the slowest kinetic phase in thioredoxin folding. Biochemistry 26, 6765-6774. Kelley, R.F. and Stellwagen, E. (1984). Conformational transitions of thioredoxin in guanidine hydrochloride. Biochemistry 23, 5095-5102. Khorasanizadeh, S., Peters, I.D., Butt, T.R., and Roder, H. (1993). Folding and Stability of a Tryptophan-Containing Mutant of Ubiquitin. Biochemistry 32, 7054-7063. Khorasanizadeh, S., Peters, I.D., and Roder, H. (1996). Evidence for a three-state model of protein folding from kinetic analysis of ubiquitin variants with altered core residues. Nature Struct. Biol. 3, 193-205. Kiefhaber, T. and Baldwin, R.L. (1995). Intrinsic stability of individual a helices modulates structure and stability of the apomyoglobin molten globule. J. Mol. Biol. 252, 122-132. Kiefhaber, T., Grunert, H.-R, Hahn, U., and Schmid, F.X. (1990a). Replacement of a cis proline simplifies the mechanism of ribonuclease Tl folding. Biochemistry 29, 6475-6480. Kiefhaber, T., Quaas, R., Hahn, U., and Schmid, F.X. (1990b). Folding of ribonuclease Tl. 2. Kinetic models for the folding and unfolding reactions. Biochemistry 29, 3061-3070. Kiefhaber, T., Quaas, R., Hahn, U., and Schmid, F.X. (1990c). Folding of Ribonuclease Tj. 1. Existence of multiple unfolded states created by proline isomerization. Biochemistry 29, 3053-3061. Kiefhaber, T., Kohler, H.H., and Schmid, F.X. (1992). Kinetic Coupling Between Protein Folding and Prolyl Isomerization. 1. Theoretical Models. J. Mol. Biol. 224, 217-229. Kim, P.S. and Baldwin, R.L. (1982). Specific intermediates in the folding reactions of small proteins and the mechanism of protein folding. Annu. Rev. Biochem. 51, 459-489. Kim, P.S. and Baldwin, R.L. (1990). Intermediates in the folding reactions of small proteins. Annu. Rev. Biochem. 59, 631-660. King, J., Haasepettingell, C, Robinson, A.S., Speed, M., and Mitraki, A. (1996). Thermolabile folding intermediates: Inclusion body precursors and chaperonin substrates. FASEB J. 10, 57-66. Kragelund, B.B., Robinson, C.V., Knudsen, J., Dobson, CM., and Poulsen, F.M. (1995). Folding of a four-helix bundle: Studies of acyl-coenzyme A binding protein. Biochemistry 34, 7217-7224. Kragelund, B.B., Hojrup, P., Jensen, M.S., Schjerling, C.K., Juul, E., Knudsen, J., and Poulsen, F.M. (1996). Fast and one-step folding of closely and distantly related homologous proteins of a four-helix bundle family. J. Mol. Biol. 256, 187-200. Kuwajima, K. (1977). A folding model of ot-lactalbumin deduced from the three-state denaturation mechanism. J. Mol. Biol. 114, 241-258. Kuwajima, K. (1989). The molten globule as a clue for understanding the folding and cooperativity of globular-protein structure. Proteins, Struct. Funct. Genet. 6, 87-103. Kuwajima, K. (1996). The molten globule state of a-lactalbumin. FASEB J. 10, 102-109. Lecomte, J.T.J, and Matthews, C.R. (1993). Unraveling the Mechanism of Protein Folding—New Tricks for an Old Problem. Protein Eng. 6, 1-10. Lesk, A.M. and Rose, G.D. (1981). Folding units in globular proteins. Proc. Natl. Acad. Sci. U. S. A. 78, 4304-4308. Levinthal, C.J. (1968). Are there pathways for protein folding? J. Chim. Phys. 65, 44-45. Lorimer, G.H. (1996). A quantitative assessment of the role of chaperonin proteins in protein folding in vivo. FASEB J. 10,5-9. Macarthur, M.W. and Thornton, J.M. (1991). Influence of proline residues on protein conformation. J. Mol. Biol. 218, 397-412. Makhatadze, G.I. and Privalov, PL. (1994). Hydration effects in protein unfolding. Biophys. Chem. 51, 291-309. Makhatadze, G.I. and Privalov, PL. (1995). Energetics of protein structure. Adv. Protein Chem. 47. 307-425.
210
FRANZ X. SCHMID
Mann, C.J. and Matthews, C.R. (1993). Structure and Stability of an Early Folding Intermediate of Escherichia-coli trp Aporepressor Measured by Far-UV Stopped-Flow Circular Dichroism and 8-Anilino-l-Naphthalene Sulfonate Binding. Biochemistry 32, 5282-5290. Mann, C.J., Shao, X., and Matthews, C.R. (1995) Characterization of the slow folding reactions of trp aporepressor from Escherichia coli by mutational analysis of prolines and catalysis by a peptidylprolyl isomerase. Biochemistry 34, 14573-14580. Matouschek, A., Kellis, J.T., Serrano, L., Bycroft, M., and Fersht, A.R. (1990). Transient folding intermediates characterized by protein engineering. Nature 346, 440-445. Matouschek, A., Serrano, L., Meiering, E.M., Bycroft, M., and Fersht, A.R. (1992). The Folding of an Enzyme .5. H/H-2 Exchange-Nuclear Magnetic Resonance Studies on the Folding Pathway of Barnase-Complementarity to and Agreement with Protein Engineering Studies. J. Mol. Biol. 224, 837-845. Matouschek, A., Otzen, D.E., Itzhaki, L.S., Jackson, S.E., and Fersht, A.R. (1995a). Movement of the position of the transition state in protein folding. Biochemistry 34, 13656-13662. Matouschek, A., Rospert, S., Schmid, K., Glick, B.S., and Schatz, G. (1995b). Cyclophilin catalyzes protein folding in yeast mitochondria. Proc. Nat. Acad. Sci. U. S. A. 92, 6319-6323. Matthews, C.R. (1987). Effect of point mutations on the folding of globular proteins. Meth. Enzymol. 154,498-511. Matthews, C.R. (1993). Pathways of Protein Folding. Annu. Rev. Biochem. 62, 653-683. Mayr, L.M. and Schmid, FX. (1993). Kinetic models for unfolding and refolding of ribonuclease Tl with substitution of cis proline 39 by alanine. J. Mol. Biol. 231, 913-926. Mayr, L.M., Landt, O., Hahn, U., and Schmid, FX. (1993). Stability and folding kinetics of ribonuclease Tl are strongly altered by the replacement of ci's-proline 39 with alanine. J. Mol. Biol. 231, 897-912. Mayr, E.-M., Jaenicke, R., and Glockshuber, R. (1994). Domain interactions and connecting peptides in lens crystallins. J. Mol. Biol. 235, 84-88. Mayr, L.M., Willbold, D., Rosch, P., and Schmid, F.X. (1994). Generation of a non-prolyl cis peptide bond in ribonuclease T l . J. Mol. Biol. 240, 288-293. Mayr, L.M., Odefey, C, Schutkowski, M., and Schmid, F.X. (1996). Kinetic analysis of the unfolding and refolding ofribonucleaseTl by a stopped-flow double-mixing technique. Biochemistry 35, 5550-5561. Milla, M.E. and Sauer, R.T (1994). P22 Arc Repressor: folding kinetics of a single-domain, dimeric protein. Biochemistry 33, 1125-1133. Milla, M.E., Brown, B.M., Waldburger, CD., and Sauer, R.T. (1995). P22 arc repressor: Transition state properties inferred from mutational effects on the rates of protein unfolding and refolding. Biochemistry 34, 13914-13919. Miranker, A.D. and Dobson, CM. (1996). Collapse and cooperativity in protein folding. Curr. Opin. Struct. Biol. 6, 31-42. Miranker, A., Radford, S.E., Karplus, M., and Dobson, CM. (1991). Demonstration by NMR of Folding Domains in Lysozyme. Nature 349, 633-636. Mitraki, A., Fane, B., Haasepettingell, C, Sturtevant, J., and King, J. (1991). Global Suppression of Protein Folding Defects and Inclusion Body Formation. Science 253, 54-58. Mullins, L.S., Pace, C.N., and Raushel, FM. (1993). Investigation of Ribonuclease-T(l) Folding Intermediates by Hydrogen-Deuterium Amide Exchange-2-Dimensional NMR Spectroscopy. Biochemistry 32, 6152-6156. Miicke, M. and Schmid, F.X. (1994). Intact disulfide bonds decelerate the folding of ribonuclease Tl. J. Mol. Biol. 239,713-725. Myers, J.K., Pace, C.N., and Scholtz, J.M. (1995). Denaturant m values and heat capacity changes: Relation to changes in accessible surface areas of protein unfolding. Protein Sci. 4, 2138-2148. Nail, B.T. (1985). Proline isomerization and protein folding. Comments Mol. Cell. Biophys. 3,123-143.
Protein Folding
211
Nail, B.T., Garel, J.-R., and Baldwin, R.L. (1978). Test of the extended two-state model for the kinetic intermediates observed in the folding transition of ribonuclease A. J. Mol. Biol. 118, 317-330. Nolting, B., Golbik, R., and Fersht, A.R. (1995). Submillisecond events in protein folding. Proc. Natl. Acad. Sci. USA 92, 10668-10672. Odefey, C, Mayr, L.M., and Schmid, F.X. (1995). Non-prolyl cis-trans peptide bond isomerization as a rate-determining step in protein unfolding and refolding. J. Mol. Biol. 245, 69-78. Ohguchi, M. and Wada, A. (1984). Liquidlike state of side chains at the intermediate stage of protein denaturation. Advances in Biophysics 18, 75-90. Oliveberg, M., Tan, Y.-J., and Fersht, A.R. (1995). Negative activation enthalpies in the kinetics of protein folding. Proc. Nat. Acad. Sci. U. S. A. 92, 8926-8929. Otzen, D.E., Itzhaki, L.S., Elmasry, N.F., Jackson, S.E., and Fersht, A.R. (1994). Structure of the transition state for the folding/unfolding of the barley chymotrypsin inhibitor 2 and its implications for mechanisms of protein folding. Proc. Natl. Acad. Sci. USA 91, 10422-10425. Pace, C.N. (1986). Determination and analysis of urea and guanidine hydrochloride denaturation curves. Meth. Enzymol. 131, 266-280. Pain, R.H. (1994). Mechanisms of Protein Folding. IRL Press, Oxford, England. Pascher, T, Chesick, J.P., Winkler, J.R., and Gray, H.B. (1996). Protein folding triggered by electron transfer. Science 271, 1558-1560. Peng, Z., Wu, L.C., and Kim, PS. (1995). Local Structural Preferences in the oc-Lactalbumin Molten Globule. Biochemistry 34, 3248-3252. Planchenault, T, Navon, A., Schulze, A.J., and Goldberg, M.E. (1996). Transient non-native interactions in early folding intermediates do not influence the folding kinetics of Escherichia coli tryptophan synthase (3(2) subunits. Eur. J. Biochem. 240, 615-621. Privalov, PL. (1979). Stability of proteins. Adv. Protein Chem. 33, 167-241. Privalov, PL. (1982). Stability of proteins, proteins which do not present a single cooperative system. Adv. Protein Chem. 35, 1-104. Privalov, PL. (1996). Intermediate states in protein folding. J. Mol. Biol. 258, 707-725. Privalov, PL. and Gill, S.J. (1988). Stability of protein structure and hydrophobic interaction. Adv. Protein Chem. 39, 191-234. Privalov, PL. and Potekhin, S.A. (1986). Scanning microcalorimetry in studying temperature-induced changes in proteins. Meth. Enzymol. 131, 4-51. Ptitsyn, O.B. (1992). The molten globule. In: Protein Folding. (Creighton, T.E., Ed.), pp. 243-300. W.H. Freeman, New York. Ptitsyn, O.B. (1995). Molten globule and protein folding. Adv. Protein Chem. 47, 83-229. Rahfeld, J.-U., Schierhorn, A., Mann, K.-H., and Fischer, G. (1994a). A novel peptidyl-prolyl cis/trans isomerase from Escherichia coli. FEBS Lett. 343, 65-69. Rahfeld, J.U., Riicknagel, K.P., Schelbert, B., Ludwig, B., Hacker, J., Mann, K., and Fischer, G. (1994b). Confirmation of the existence of a third family among peptidyl-prolyl cis/trans isomerases— Amino acid sequence and recombinant production of parvulin. FEBS Lett. 352, 180-184. Ramachandran, G.N. and Mitra, A.K. (1976). An explanation for the rare occurrence of cis peptide units in proteins and polypeptides. J. Mol. Biol. 85-92. Ramdas, L. and Nail, B.T. (1986). Folding/unfolding kinetics of mutant forms of iso-1-cytochrome c with replacement of proline-71. Biochemistry 25, 6959-6964. Rassow, J., Mohrs, K., Koidl, S., Barthelmess, I.B., Pfanner, N., and Tropschug, M. (1995). Cyclophilin 20 is involved in mitochondrial protein folding in cooperation with molecular chaperones Hsp70 and Hsp60. Mol. Cell Biol. 15, 2654-2662. Redfield, C, Smith, R.A.G., and Dobson, CM. (1994). Structural characterization of a highly-ordered 'molten globule' at low pH. Nature Struct. Biol. 1, 23-29. Ridge, J.A., Baldwin, R.L., and Labhardt, A.M. (1981). Nature of the fast and slow refolding reactions of iron (iii) cytochrome c. Biochemistry 20, 1622-1630.
212
FRANZ X. SCHMID
Roder, H., Elove, G.A., and Englander, S.W. (1988). Structural characterization of folding intermediates in cytochrome c by hydrogen exchange labelling and proton NMR. Nature 700-704. Rudolph, R., Siebendritt, R., Neslauer, G., Sharma, A.K., and Jaenicke, R. (1990). Folding of an alI-0 protein: independent domain folding in y-H-crystallin from calf eye lens. Proc. Nat. Acad. Sci. USA 87, 4625-4629. Saab-Rincon, G., Gualfetti, P.J., and Matthews, C.R. (1996). Mutagenic and thermodynamic analyses of residual structure in the a subunit of tryptophan synthase. Biochemistry 35, 1988-1994. Schellman, J.A. (1978). Solvent denaturation. Biopolymers 17, 1305-1322. Schindelin, H., Marahiel, M.A., and Heinemann, U. (1993). Universal Nucleic Acid-Binding Domain Revealed by Crystal Structure of the B. subtilis Major Cold-Shock Protein. Nature 364, 164-168. Schindler, T. and Schmid, EX. (1996). Thermodynamic properties of an extremely rapid protein folding reaction. Biochemistry 35, 16833-16842. Schindler, T., Herrler, M., Marahiel, M.A., and Schmid, EX. (1995). Extremely rapid folding in the absence of intermediates: The cold-shock protein from Bacillus subtilis. Nature Struct. Biol. 2, 663-673. Schmid, F.X. (1983). Mechanism of folding ofribonucleaseA. Slow refolding is a sequential reaction via structural intermediates. Biochemistry 22, 4690-4696. Schmid, F.X. (1992). Kinetics of unfolding and refolding of single-domain proteins. In: Protein Folding. (Creighton, T.E., Ed.), pp. 197-241. W.H. Freeman, New York. Schmid, F.X. (1993). Prolyl Isomerase-Enzymatic Catalysis of Slow Protein-Folding Reactions. Annu. Rev. Biophys. Biomol. Struct. 22, 123-143. Schmid, F.X. and Blaschek, H. (1981). A native-like intermediate on the ribonuclease A folding pathway. 2. Comparison of its properties to native ribonuclease A. Eur. J. Biochem. 114, 111-117. Schmid, EX., Mayr, L.M., Miicke, M., and Schonbrunner, E.R. (1993). Prolyl isomerases: Role in protein folding. Adv. Protein Chem. 44, 25-66. Schnuchel, A., Wiltschek, R., Czisch, M., Herrler, M., Willimsky, G., Graumann, P., Marahiel, M.A., and Holak, T.A. (1993). Structure in solution of the major cold-shock protein from Bacillus subtilis. Nature 364, 169-171. Schreiber, G. and Fersht, A.R. (1993). The Refolding of c/^-Peptidylprolyl and /ram-Peptidylprolyl Isomers of Barstar. Biochemistry 32, 11195-11203. Schulman, B.A. and Kim, PS. (1996). Proline scanning mutagenesis of a molten globule reveals non-cooperative formation of a protein's overall topology. Nature Struct. Biol. 3, 682-687. Schulman, B.A., Redfield, C, Peng, Z.Y., Dobson, CM., and Kim, PS. (1995). Different subdomains are most protected from hydrogen exchange in the molten globule and native states of human a-lactalbumin. J. Mol. Biol. 253, 651-657. Seckler, R. and Jaenicke, R. (1992). Protein Folding and Protein Refolding. FASEB J. 6, 2545-2552. Segawa, S.-I. and Sugihara, M. (1984). Characterization of the transition state of lysozyme unfolding. 1. Effect of protein-solvent interactions on the transition state. Biopolymers 23, 2473-2488. Serrano, L., Kellis, J.T., Cann, P., Matouschek, A., and Fersht, A.R. (1992a). The folding of an enzyme. 2. Substructure of Barnase and the contribution of different interactions to protein stability. J. Mol. Biol. 224, 783-804. Serrano, L., Matouschek, A., and Fersht, A.R. (1992b). The folding of an enzyme. VI. The folding pathway of Barnase: Comparison with theoretical models. J. Mol. Biol. 224, 847-859. Serrano, L., Matouschek, A., and Fersht, A.R. (1992c). The folding of an enzyme. 3. Structure of the transition state for unfolding of Barnase analysed by a protein engineering procedure. J. Mol. Biol. 224,805-818. Serrano, L., Day, A.G., and Fersht, A.R. (1993). Stepwise mutation of Barnase to Binase—A procedure for engineering increased stability of proteins and an experimental analysis of the evolution of protein stability. J. Mol. Biol. 233, 305-312. Shastry, M.C.R., Sauder, J.M., and Roder, H. (1998). Kinetic and structural analysis of submillisecond folding events in cytochrome c. Ace. Chem. Res. 31, 717-725.
Protein Folding
213
Shirley, B.A. (1995). Protein Stability and Folding. Humana Press, Totowa, NJ. Sinclair, J.F, Ziegler, M.M., and Baldwin, T.O. (1994). Kinetic partitioning during protein folding yields multiple native states. Nature Struct. Biol. 1, 320-326. Sosnick, T.R., Mayne, L., Hiller, R., and Englander, S.W. (1994). The barriers in protein folding. Nature Struct. Biol. 1, 149-156. Sosnick, T.R., Mayne, L., and Englander, S.W. (1996). Molecular collapse: The rate-limiting step in two-state cytochrome c folding. Proteins: Struct. Funct. Genet. 24, 413-426. Steinbacher, S., Seckler, R., Miller, S., Steipe, B., Huber, R., and Reinemer, P. (1994). Crystal structure of P22 tailspike protein: Interdigitated subunits in a thermostable trimer. Science 265, 383-386. Steinmann, B., Bruckner, P., and Supertifurga, A. (1991). Cyclosporin-A slows collagen triple-helix formation in-vivo—Indirect evidence for a physiologic role of peptidyl-prolyl cis-trans-isomerase. J. Biol. Chem. 266, 1299-1303. Stewart, D.E., Sarkar, A., and Wampler, J.E. (1990). Occurrence and role of cis peptide bonds in protein structures. J. Mol. Biol. 214, 253-260. Sturtevant, J.M., Yu, M.-H., Haase-Pettingell, C, and King, J. (1989). Thermostability of temperaturesensitive folding mutants of the P22 tailspike protein. J. Biol. Chem. 264, 10693-10698. Takahashi, N., Hayano, T, and Suzuki, M. (1989). Peptidyl-prolyl cis-trans isomerase is the cyclosporin A-binding protein cyclophilin. Nature 337, 473-475. Takahshi, S., Yeh, S.R., Das, T.K., Gottfried, D.S., and Rousseau, D.L. (1997). Folding of cytochrome c initiated by submillisecond mixing. Nature Struct. Biol. 4, 44-50. Tan, Y.J., Oliveberg, M., and Fersht, A.R. (1996). Titration properties and thermodynamics of the transition state for folding: Comparison of two-state and multi-state folding pathways. J. Mol. Biol. 264, 377-389. Tanford, C. (1968a). Protein denaturation, Part A. Adv. Protein Chem. 23, 121-217. Tanford, C. (1968b). Protein denaturation, Part B. The transition state from native to denatured state. Adv. Protein Chem. 23, 218-282. Tanford, C. (1970). Protein denaturation, Part C. Adv. Protein Chem. 24, 1-95. Tanford, C, Aune, K.C., and Ikai, A. (1973). Kinetics of unfolding and refolding of proteins. III. Results for lysozyme. J. Mol. Biol. 73, 185-197. Teschner, W., Rudolph, R., and Garel, J.-R. (1987). Intermediates on the folding pathway of octopine dehydrogenase from Pectenjacobaeus. Biochemistry 26, 2791-2796. Timerman, A.P., Wiederrecht, G., Marcy, A., and Fleischer, S. (1995). Characterization of an exchange reaction between soluble FKBP-12 and the FKBP ryanodine receptor complex—Modulation by FKBP mutants deficient in peptidyl-prolyl isomerase activity. J. Biol. Chem. 270, 2451-2459. Todd, M.J., Lorimer, G.H., and Thirumalai, D. (1996). Chaperonin-facilitated protein folding: Optimization of rate and yield by an iterative annealing mechanism. Proc. Natl. Acad. Sci. USA 93, 4030-4035. Tokatlidis, K., Friguet, B., Devillebonne, D., Baleux, F, Fedorov, A.N., Navon, A., Djavadiohaniance, L., and Goldberg, M.E. (1995). Nascent chains: Folding and chaperone interaction during elongation on ribosomes. Philos. Trans. R. Soc. Lond. [Biol]. 348, 89-95. Tsong, T.-Y, Baldwin, R.L., and Elson, E.L. (1971). The sequential unfolding of ribonuclease A: detection of a fast initial phase in the kinetics of unfolding. Proc. Nat. Acad. Sci. USA 68, 2712-2715. Tsong, T.-Y, Baldwin, R.L., and Elson, E.L. (1972). Properties of the refolding and unfolding reactions of ribonuclease A. Proc. Nat. Acad. Sci. USA 69, 1809-1812. Tsuji, T, Chrunyk, B.A., Chen, X., and Matthews, C.R. (1993). Mutagenic analysis of the interior packing of an a/(3 barrel protein. Effects on the stabilities and rates of interconversion of the native and partially folded forms of the a subunit of Trp synthase. Biochemistry 32, 5566-5575. Tsunenaga, M., Goto, Y, Kawata, Y, and Hamaguchi, K. (1987). Unfolding and refolding of a type immunoglobulin light chain and its variable and constant fragments. Biochemistry 26,6044-6051.
214
FRANZ X. SCHMID
Tweedy, N.B., Hurle, M.R., Chrunyk, B.A., and Matthews, C.R. (1990). Multiple replacements at position 211 in the a subunit of tryptophan synthase as a probe of the folding unit association reaction. Biochemistry 29, 1539-1545. Udgaonkar, J.B. and Baldwin, R.L. (1988). NMR evidence for an early framework intermediate on the folding pathway of ribonuclease A. Nature 335, 694-699. Udgaonkar, J.B. and Baldwin, R.L. (1995). Nature of the early folding intermediate of ribonuclease A. Biochemistry 34, 4088-4096. Utiyama, H. and Baldwin, R.L. (1986). Kinetic mechanisms of protein folding. Meth. Enzymol. 131, 51-70. Vanhove, M., Raquet, X., and Frere, J.-M. (1995). Investigation of the Folding Pathway of the TEM-1 P-Lactamase. Proteins: Struct. Funct. Genet. 22, 110-118. Vanhove, M., Raquet, X., Palzkill, T., Pain, R.H., and Frere, J.M. (1996). The rate-limiting step in the folding of the c/s-Prol67Thr mutant of TEM-1 3-lactamase is the trans to cis isomerization of a non-proline peptide bond. Proteins: Struct. Funct. Genet. 25, 104-111. Vaucheret, H., Signon, L., Lebras, G., and Garel, J.-R. (1987). Mechanism of renaturation of a large protein, aspartokinase-homoserine dehydrogenase. Biochemistry 26, 2785-2790. Viguera, A.R., Martinez, J.C., Filimonov, V.V., Mateo, PL., and Serrano, L. (1994). Thermodynamic and kinetic analysis of the SH3 domain of spectrin shows a two-state folding transition. Biochemistry 32, 2142-2150. Waldburger, CD., Jonsson, T, and Sauer, R.T. (1996). Barriers to protein folding: Formation of buried polar interactions is a slow step in acquisition of structure. Proc. Natl. Acad. Sci. USA 93, 2629-2634. Weissman, J.S. and Kim, PS. (1991). Reexamination of the folding of BPTI: predominance of native intermediates. Science 253, 1386-1393. Weissman, J.S. and Kim, PS. (1992). Kinetic Role of Nonnative Species in the Folding of Bovine Pancreatic Trypsin Inhibitor. Proc. Natl. Acad. Sci. USA 89, 9900-9904. Weissman, J.S. and Kim, PS. (1995). A kinetic explanation for the rearrangement pathway of BPTI folding. Nature Struct. Biol. 2, 1123-1130. Weissman, J.S., Hohl, CM., Kovalenko, O., Kashi, Y, Chen, S.X., Braig, K., Saibil, H.R., Fenton, W.A., and Horwich, A.L. (1995). Mechanism of GroEL action: Productive release of polypeptide from a sequestered position under GroES. Cell 83, 577-587. Wetlaufer, D.B. and Ristow, S. (1973). Acquisition of three-dimensional structure of proteins. Ann. Rev. Biochem. 42, 135-158. White, T.B., Berget, P.B., and Nail, B.T. (1987). Changes in conformation and slow refolding kinetics in mutant iso-2-cytochrome c with replacement of a conserved proline residue. Biochemistry 26, 4358-4366. Willimsky, G., Bang, H., Fischer, G., and Marahiel, M.A. (1992). Characterization of cspB, a Bacillus subtilis inducible cold shock gene affecting cell viability at low temperature. J. Bacteriol. 174, 6326-6335. Wistow, G., Turnell, B., Summers, L., Slingsby, C, Moss, D., Miller, L., Lindley, P., and Blundell, T. (1983). X-ray analysis of the eye lens protein y-II crystallin at 1.9 A resolution. J. Mol. Biol. 170, 175-202. Wood, L.C, White, T.B., Ramdas, L., and Nail, B.T. (1988). Replacement of a conserved proline eliminates the absorbance-detected slow folding phase of iso-2-cytochrome c. Biochemistry 27, 8562-8568. Wu, L.C, Peng, Z.Y., and Kim, PS. (1995). Bipartite structure of the a-lactalbumin molten globule. Nature Struct. Biol. 2, 281-286. Wunderlich, M., Otto, A., Maskos, K., Mucke, M., Seckler, R., and Glockshuber, R. (1995). Efficient catalysis of disulfide formation during protein folding with a single active-site cysteine. J. Mol. Biol. 247, 28-33.
Protein Folding
215
Yem, A.W., Tomasselli, A.G., Heinrikson, R.L., Zurcherneely, H., Ruff, V. A., Johnson, R. A., and Deibel, M.R. (1992). The Hsp56 component of steroid receptor complexes binds to immobilized FK506 and shows homology to FKBP-12 and FKBP-13. J. Biol. Chem. 267, 2868-2871. Yoo, S.H., Myszka, D.G., Yeh, C.Y., Mcmurray, M., Hill, C.P., and Sundquist, W.I. Molecular recognition in the HIV-1 capsid/cyclophilin a complex. J. Mol. Biol. 269, 780-795. Zehfus, M.H. and Rose, G.D. (1986). Compact units in proteins. Biochemistry 25, 5759-5765. Zhao, YD., Chen, Y.Q., Schutkowski, M., Fischer, G., and Ke, H.M. Cyclophilin A complexed with a fragment of HIV-1 gag protein: Insights into HIV-1 infectious activity. Structure. 5, 139-146. Ziegler, M.M., Goldberg, M.E., Chaffotte, A.F., and Baldwin, T.O. (1993). Refolding of luciferase subunits from urea and assembly of the active heterodimer. J. Biol. Chem. 268, 10760-10765.
Chapter 6
Thermodynamics of Protein Folding and Stability ALAN COOPER
Abstract Introduction Semantics: Definitions and General Considerations Thermodynamics Thermal Energies and Fluctuations The Two-State Approximation Thermodynamics of Unfolding: Reversible Globular Proteins Differential Scanning Calorimetry Thermodynamics of Unfolding: Empirical Data Cold Denaturation Thermodynamics of Unfolding: The Molecular Interpretation Effect of Ligand-Binding on Folding Thermodynamics Effect of pH Electrostatic Interactions Denaturant and Osmolytes "Molten Globules,, and Other Nonnative States Reversibility In Memoriam: Christian B. Anfinsen (1916-1995)1
Protein: A Comprehensive Treatise Volume 2, pages 217-270 Copyright © 1999 by JAI Press Inc. All rights of reproduction in any form reserved. ISBN: 1-55938-672-X
217
218 218 219 225 233 234 238 238 240 243 243 247 250 251 253 255 256
ALAN COOPER
218 Effects of Crosslinking Fibrous Proteins Membrane Proteins Finale Notes
259 263 264 265 265
ABSTRACT The fundamental concepts of protein conformation and interatomic interactions are introduced, and the basic principles of thermodynamics relevant to the folding of proteins are presented within this chapter. Thermodynamic energy fluctations are important in small systems such as protein molecules. Differential scanning calorimetry of solutions of globular proteins that unfold reversibly yields unambiguous thermodynamic data, which are discussed. Molecular interpretations of the thermodynamics of unfolding are reviewed. Protein unfolding is affected by ligand-binding, pH, and denaturants. The clearest interpretations of calorimetric data for protein unfolding assume a two-state model, but there is evidence for intermediate, partly folded states such as the "molten globule" in some circumstances. Rigorous thermodynamic analysis of protein unfolding requires reversibility, but lack of reversibility may be indicated by scanning calorimetry in which case detailed interpretation of the data is restricted. Crosslinking by disulphide bonds affects the thermodynamics of unfolding. Study of the thermodynamics of unfolding of fibrous and membrane proteins presents technical difficulties, but some insight has been obtained.
INTRODUCTION Remarkable early work, notably by Hsien Wu and others (Wu, 1931; Anfinsen and Scheraga, 1975; Edsall, 1995), established the idea that denaturation of soluble proteins involved transitions from a relatively compact orderly structure to a more flexible, disorganized, open polypeptide chain. It was also known at this time that denaturation could be reversed. But it was the work of Anfinsen and colleagues in the late 1950s on the refolding of polypeptides that really galvanized interest in the physical chemistry of this process, particularly at the time when the molecular basis for the genetic code was being established (Anfinsen, 1973). The ability of polypeptides with appropriate primary sequence to fold into active native structures without, necessarily, the intervention of external agencies completes a vital link in the chain leading to expression of genetic information. Under the correct physiochemical conditions, the folding of a protein is spontaneous and determined solely by its amino acid sequence. Once a gene is expressed, having been translated into a specific polypeptide sequence, thermodynamics (possibly guided by kinetics) takes over and the intrinsically flexible, irregular polymer chain folds into the more compact, specific structure required (usually) for biological function.
Thermodynamics of Protein Folding and Stability
219
This ability for a polypeptide to select one conformation spontaneously and usually quite rapidly from a myriad of alternatives has given rise to what has come to be called "The Protein-Folding Problem." This is really not just one problem but several, involving basic questions such as: How? Why? Whether? How a protein folds is a question (or series of questions) relating to mechanism; What are the pathways involved in the process whereby the unfolded protein (whatever that is) reaches the folded state ? What are the kinetics ? What intermediates are involved, if any, and are they unique ? What are the rate-limiting steps ? and so forth. It is an area that has become much more at the forefront recently with the demonstration of "chaperone" and related effects in protein-folding. It is also of considerable interest to those attempting the awesome task of predicting protein structures from amino acid sequences, since the shortcuts taken by the protein itself may help in suggesting effective algorithms for predictive methods. However, these are treated more fully elsewhere in this series (Heringa et al., 1997). Why a protein folds relates to the even more fundamental thermodynamic problem of the underlying molecular interactions responsible for stabilizing the folded conformation relative to other intrinsically more likely irregular states of the polypeptide. This is the subject to be covered here. Whether a protein folds depends on both of the above. In order for a particular polypeptide sequence to adopt spontaneously a functionally effective conformation, the folded form must have a lower thermodynamic free energy than the galaxy of other available conformations. The folded conformation must also be kinetically attainable with appropriate pathways, no unattainable intermediate states, and no irreversible kinetic traps. My aim in this chapter is to review the thermodynamic background to protein folding and stability with an overview of the current picture as I see it. Many detailed reviews in this area have appeared (Tanford, 1968, 1970; Privalov, 1979, 1982; Murphy and Freire, 1992), some of them very recently (Dill and Stigter, 1995; Honig and Yang, 1995; Lazaridis et al., 1995; Makhatadze and Privalov, 1995), and it is not my intention to cover the same ground in as much detail as can be found there. Rather, I will try to provide sufficient basic background to allow understanding and critical appraisal of this work by nonspecialist readers. Semantics: Definitions and General Considerations
Many of the conceptual difficulties in this field, especially for newcomers, arise from semantics—the way in which the same or apparently similar terminology is used to mean different things by different workers. Consciously or unconsciously, people with different backgrounds can use the same terms to mean entirely different things. And the definitions of terms may change over time as well, so the same terms encountered in some of the older literature may not carry the same meaning in more recent work. The term "random coil," for example, is a case in point. To a polymer chemist this might mean a highly flexible, dynamic, fluctuating, disordered chain structure in which no one molecule or region of a molecule is like any other. To a
ALAN COOPER
220
protein crystallographer however, this same term might be used to refer to those regions of a protein structure that do not contain any recognizable helix, sheet, or other motif—but yet is a quite fixed, well-defined conformation identical from one molecule to the next. Because it is important not to be confused by conflicting terminology, in the next few sections, I will try to clarify what I mean by the various possible conformational states of a polypeptide and the sorts of interactions that might be responsible for their occurrence. Semantics I: Conformational States Although polypeptides are inherently flexible polymers, we should be clear right from the very start that the "random coil" is the least likely state of any polypeptide in water. Free rotations about torsional angles (<]), \\f) of the peptide unit would allow a myriad of potential chain conformations.2 But these rotations are by no means "free." Simple steric constraints, epitomized in the classic Ramachandran plot, restrict the range of realistically attainable §-\\f angles even for a polypeptide in a vacuum. The physical bulk of peptide atoms and sidechain groups prevents close encounters or overlap—except at a very high energy cost—and means that only relatively limited areas of -\|/ space are available. Moreover, polypeptide is intrinsically "sticky stuff' (one of the most abundant proteins, collagen, takes its name from the Greek KoXXa = glue) and water is a far-from-ideal solvent. Hydrogen bonding of water molecules to peptide backbone -NH and -C=0 groups will further restrict conformational freedom. Interactions, however transient, between peptide groups and side chain residues on the polypeptide will also take a part. (At higher concentrations, interactions between adjacent polypeptide molecules is also a factor of considerable importance often leading to coagulation or aggregation of denatured proteins.) Even so, the range of available conformations is enormous, and we must choose our language carefully when attempting to describe them. Traditionally, emphasis is placed on the backbone conformations that a polypeptide might adopt since these are easiest to describe. Hence if we could take a snapshot look at an individual polypeptide, we might see differing amounts of Regular structure
Irregular structure
Motif structure
involving a repeating pattern of $-\\f angles, with defined H-bond connectivity, giving rise to the familiar a-helix, P-sheets, 3-10 helix involving stretches of peptides with no repeating pattern of <J)-VK angles; differing patterns of H-bonding, including hydrogen bonding to surrounding water molecules commonly occurring patterns of adjacent $-\\f angles spanning just a few amino acids, not necessarily with regularity, but giving a recognizable conformational feature (e.g., P-bends, turns).
Thermodynamics of Protein Folding and Stability
221
In a population of polypeptide molecules each of these structural classes might be one of the following Homogeneous Heterogeneous
identical conformation in all molecules, with any one molecule superimposable upon another different conformations from one molecule to another, with different <|>-\|/ angles, H-bond connectivity, hydration, and so forth
And this latter conformational heterogeneity might be one of two possibilities: Static Dynamic
unchanging with time changing randomly/stochastically with time in any one molecule
[Similar considerations will apply equally to side-chain conformations, though this is rarely done for reasons of complexity.] It is worth emphasizing here that all protein molecules, whether folded or not, are dynamically heterogeneous—just like any other substance above absolute zero. On a short enough timescale, and over short enough distances • No part of any protein is ever static. • No protein molecule ever has exactly the same conformation as any other. • No protein molecule ever exists in the same conformation twice. This is simply an unavoidable consequence of thermodynamics and the nature of heat (Cooper, 1976, 1984; Brooks et al., 1988), and might be pictured as just another manifestation of Brownian Motion at the (macro)molecular level. The timescale for dynamic fluctuations might be anything from femtoseconds to kiloseconds, and their experimental/functional consequences will depend on the relevant observational timescale. The magnitudes of the conformational fluctuations will be mostly small, involving thermal vibration, libration, and torsion of individual groups, but much larger effects are also possible (Cooper, 1984). Against this background and given these definitions, how might we recognize or classify or define the different conformational states of a protein? Maybe it can as follows. Folded
the biologically active ("native") form of the polypeptide (usually); compact, showing extensive average conformational homogeneity with recognizable regions of regular, irregular, and motif structures on a background of dynamic thermal fluctuations; well-defined H-bond connectivity,
ALAN COOPER
222
Unfolded
much of it internalized, with secondary and tertiary structure characteristic of the particular protein everything else! An ill-defined state, or rather set of states comprising anything that is not recognizably folded; a population of conformations spanning and sampling wide ranges of conformation space depending on conditions; usually quite open, irregular, heterogeneous, flexible, dynamic structures—no one molecule is like another, nor like itself from one moment to another; not necessarily "random coil" (see below)—some residual, transient secondary structure possible
As sub-sets of the latter unfolded states, we might have the categories below. Misfolded
Aggregated
[olten Globule
Random Coil
partially or incorrectly folded conformers bearing some similarity to the native fold but with regions of nonnative, possibly heterogeneous structure; might result from kinetic traps or from chemical modification (proline isomerization, disulphide rearrangements, etc.) the classic "denatured," coagulated protein state; Intractable masses of entangled, unfolded polypeptide; the usual product of thermal unfolding of large proteins; usually heterogeneous, but may contain regions of regular structure a relatively compact, globular set of conformations with much regular, secondary structure in the polypeptide backbone, but with side-chain disorder; first characterized by affinity for hydrophobic probes—popular candidates as intermediates in the folding pathway (Ptitsyn, 1995; Privalov, 1996) [Caution: not all workers agree on a definition for "molten globule"!] this is the (hypothetical) state in which the conformation of any one peptide group is totally uncorrelated with any other in the chain, particularly its neighbors; all polypeptide conformations are equally likely, equally accessible, and of equal energy; populations of such molecules would show complete conformational heterogeneity; this state is almost certainly never found for any polypeptide in water! (Though, unfortunately, the term is sometimes usurped by protein crystallographers to describe the regions of their structures—loops, and so forth—that are not immediately identifiable as any of the regular structures or motifs. These are best described as irregular structure, and may be homo-
Thermodynamics of Protein Folding and Stability
223
geneous or heterogeneous, and static or dynamic, depending on circumstances.) Semantics II: Interactions
Another semantic minefield is encountered when considering the forces responsible for biomolecular interactions. Although in principle the energy of any state of a macromolecular system should be obtainable by solution of the appropriate quantum mechanical (Schrodinger) equations, in practice such an approach is not yet practicable except in very special and well-defined circumstances. And even if feasible, such calculations would be conceptually unhelpful and would lack the thermodynamic dimension that might relate derived parameters to experimental observables. In such a situation, it has been traditional to be guided by analogy and experience from other areas of physical chemistry of (generally) small molecules, and attempt to break down the overall interaction into discrete categories of pairwise interactions between recognizable molecular groupings. This is the origin of more-or-less familiar terms such as "bonded," "nonbonded," "noncovalent," "polar," "electrostatic," "hydrogen bond," "hydrophobic," "solvation," "van der Waals," and "dispersion" interactions among others. Bonded interactions are usually considered to be those directly involved in the covalent links between adjacent atoms. Stretching, bending, or rotation of these bonds, either in the polypeptide backbone or side-chain groups, will require work and will change the total energy of the system. Covalent bond stretching or bending is particularly hard work and requires energies that are usually beyond the normal range for thermal motions. Consequently, it is usually assumed that covalent bonds in proteins adopt their minimum-energy, least-strained conformations (bond lengths and angles) wherever possible. Except for the peptide group, however, rotation about many covalent bonds is relatively easy, and this is the source of inherent flexibility in the unfolded polypeptide. Nonbonded or noncovalent interactions are those between atoms or groups that are separated by more than one covalent bond. Confusingly, such interactions may be referred to as being "short-range" or "long-range," either in terms of the through-space distances between groups or, frequently, in terms of separation in sequence along the polypeptide chain. Consequently, a noncovalent interaction between two amino acid residues might be "long-range" if the residues are separated by long stretches of polypeptide in the primary sequence, yet at the same time "short-range" if, through folding, the groups lie next to each other in space. Noncovalent interactions may be broken down into the familiar categories listed above. Although it is not possible to give more than a qualitative description of the thermodynamic characteristics of each of these interaction categories at this stage, a brief description here might be useful. More details will emerge later in discussion of the folding problem. Van der Waals or London dispersion forces are the ubiquitous attractive interactions between all atoms and molecules that arise from quantum mechanical flue-
224
ALAN COOPER
tuations in the electronic distribution. They are consequences of the Heisenberg uncertainty principle. Transientfluctuationsin electron density distribution in one group will produce changes in the surrounding electrostatic field that will affect adjacent groups. In the simplest picture, a transient electric dipole will polarize or induce a similar but opposite dipole in an adjacent group such that the two transient dipoles attract. The dipole-dipole interaction is truly short range, varying as the inverse sixth power of the separation distance, and such interactions are usually only of significance for groups in close contact. The strength of the interaction also depends on properties such as high-frequency polarizability of the groups involved, but apart from this, such interactions involve very little specificity. All atoms or groups will show van der Waals attractions for each other. Also sometimes included in van der Waals interactions is the very steep repulsive potential between atoms in close contact ("van der Waals contact"). This arises from repulsions between overlapping electronic orbitals in atoms in noncovalent contact that makes atoms behave almost like hard, impenetrable spheres at sufficiently short range. Thermody namically, van der Waals interactions would normally be considered to contribute to the enthalpy of interactions, with no significant entropy component. Permanent dipoles and charges within molecules or groups give rise to somewhat longer range and more specific electrostatic interactions. Discrete charge-charge or dipole-dipole interactions may be attractive or repulsive depending on sign and orientation. A particularly close, direct electrostatic interaction between ionized residues in a structure might be called a "salt bridge." Permanent dipoles or other electronic distributions may also polarize surrounding groups to give static induced dipoles that may interact attractively. The complete description of the electrostatics of the polypeptide, folded or otherwise, must also take into account interactions with surrounding solvent water molecules and other ionic species in solution. This means that thermodynamic description of such interactions is complicated and includes both enthalpy and entropy terms. For example, even the apparently simple process of dissolving of a crystalline salt in water can be endothermic or exothermic, depending on ion size and other factors, and can be dominated by entropic contributions from solvation, restructuring of water around ions, or other indirect effects not normally visualized in the simple pulling apart of charged species. Comprehensive studies of protein and related electrostatics are described by Honig etal. (1993). Hydrogen bonds are now normally considered to be examples of particularly effective electrostatic interaction between permanent electric dipoles, especially in proteins between groups such as -NH and - C = 0 or -OH, and the -NH—0=Cinteraction is of particular historical importance for the part it played in predictions of regular helical or sheet conformations. In theoretical calculations, H-bond interactions may be handled either discretely as separate "bonds" or incorporated into the overall electrostatics of the protein. The thermodynamic contribution of hydrogen bonds to protein stability or other biomolecular interactions is surprisingly unclear. Also the term "strength of the hydrogen bond" is very ambiguous.
Thermodynamics of Protein Folding and Stability
225
This is because liquid water is a very good hydrogen-bonding solvent. Breaking of a hydrogen bond between two groups in a vacuum requires a significant amount of energy—in the region of 25 kJ mol -1 for a peptide hydrogen bond, say (Rose and Wolfenden, 1993; Lazaridis et al., 1995). But in water, such exposed groups would likely form new H-bonds to surrounding water molecules to cancel the effect, and the true "strength of a hydrogen bond" between groups in an aqueous environment might be closer to zero. The overall interaction will also include significant entropy contributions because of this solvent involvement. The usually excellent solubility of polar compounds in water reflects this, and model compound studies generally lead to a picture in which hydrogen bonds contribute little if anything to the free energy of folding of a polypeptide chain (Klotz and Farnham, 1968; Kresheck and Klotz, 1969; and others, see Dill, 1990a). [They will, of course, determine the specific conformation adopted by the polypeptide when it does fold.] Hydrophobic interactions are another manifestation of the peculiar hydrogenbonding properties of water. Based on empirical observation that nonpolar molecules are poorly soluble in water, this interaction is probably best visualized as a repulsive interaction between nonpolar groups and water rather than a direct attraction between those groups. Nonpolar, hydrophobic groups in water will tend to cluster together because of their mutual repulsion from water, not necessarily because they have any particular direct affinity for each other. The thermodynamics of this interaction are interesting (Kauzmann, 1959; Tanford, 1980). Based on studies of small nonpolar molecules, the separation or pulling apart of two hydrophobic groups in water is an exothermic process. In other words, although it generally requires work to separate such groups, heat is given out in the process. This is usually described in terms of structural rearrangements of water molecules at the molecular interface, but the molecular description is really less relevant than the empirical observation. This exothermic effect is opposed by a significant and thermodynamically unfavorable reduction in entropy of the system, also attributable to solvent structure rearrangements. The reverse process, that is the association of nonpolar groups to form a "hydrophobic bond," is consequently said to be "entropy driven" and comes about spontaneously even though it is endothermic. The enthalpies or heats of such processes are also characteristically temperaturedependent (AC effect - see later), and this has been some of the stronger evidence for the role of such interactions in protein folding. Thermodynamics
We know from experience that transformation of a protein between various conformational states might be brought about by changes in temperature, pressure, pH, ligand concentration, chemical denaturants, or other solvent changes. For each of these empirical variables there will be a set of associated thermodynamic parameters, and it is axiomatic (Le Chatelier's Principle) that a transformation may only come about if the two states have different values for these parameters. For
226
ALAN COOPER
example, temperature-induced protein unfolding (at equilibrium) arises from differences in enthalpy (AH) between folded and unfolded states; pressure denaturation can only occur if the folded and unfolded states have different partial molar volumes (the unfolded state is normally of lower volume); unfolding at high or low pH implies differences in pKA of protein acidic or basic groups; ligand-induced unfolding or stabilization of the native fold results from differences in binding affinity for ligand in the two states; chemical denaturants may act as ligands, binding differently to folded or unfolded states, or may act indirectly via changes in overall solvent properties. In each of these cases we need to know how to measure and interpret these thermodynamic parameters. One important observation is that the "folded <-> unfolded" transition is highly co-operative, at least for small globular proteins, frequently behaving as an almost perfect two-state equilibrium process akin to a macroscopic phase change (see Dill et al., 1995). This feature will be discussed in some more detail later. But our task here is to describe how the thermodynamics of transition between these various states may be measured and interpreted leading to a possible understanding of why the native folded form is usually the more stable state under relevant conditions. The arguments must necessarily be thermodynamic. We have already had cause to use terms such as "enthalpy," "entropy," and "free energy"—and it is important to be clear what these terms mean. Experts in thermodynamics may skip the next section. Basic Thermodynamics: A Primer
Thermodynamics can be a daunting subject. For that reason it is perhaps useful to summarize here the basic concepts, presented in a somewhat less conventional manner than found in the usual textbooks. What follows is a very unrigorous and highly abbreviated sketch of basic ideas of "molecular thermodynamics" or "statistical mechanics" starting from a molecular point of view and leading to classical thermodynamic relations. My aim is to encourage basic understanding of thermodynamic expressions in a way that may make standard texts more readable to the nonexpert. Except at absolute zero, all atoms and molecules are in perpetual, chaotic motion. Things we feel, like "heat" and "temperature," are just macroscopic manifestations of this motion. Although in principle one might think it possible to calculate this motion exactly (using Newton's laws of motion or quantum mechanical equivalents), in practice this is impracticable for systems containing more than just a few molecules over a realistic timescale, and downright impossible for macroscopic objects containing on the order of 1023 molecules. And in any case, the information given by such a calculation would be far too detailed to be of any real use. The way out of this problem is to take a statistical approach (statistical mechanics or thermodynamics) and concentrate on the average or most probable behavior of the molecules. This will give the mean properties, what we observe for a sample
Thermodynamics of Protein Folding and Stability
227
containing large numbers of molecules, or the time-averaged behavior of a single molecule. The basic rule—a paraphrase of the Second Law of Thermodynamics at the molecular level—is that the most probable things generally happen. The statistical probability (pA) that any molecule or system (collection of molecules) is to be found in some state, A, depends on the energy (EA) of the system together with the number of ways (wA) that energy may come about. This is expressed in the Boltzmann probability formula: PA = W A * exp(-EA/kBT)
(1)
where T is the absolute temperature (in Kelvin), kB is Boltzmann's constant (kB = 1.38 x 10"23 J K -1 ) and, again, EA is the total energy of the system, comprising all the molecular kinetic, rotational, and vibrational energy together with energy due to interactions ("bonds") within and between the molecules in the system, and wA is the number of ways in which that total energy may be achieved or distributed. Some points of detail now need to be taken into account. Firstly, it is conventional and convenient to think in terms of moles of molecules rather than actual numbers of molecules in the system. Therefore we may multiply numerator and denominator of the energy exponent (-EA/kBT) by Avogadro's number (NA) remembering that the gas constant R = NAkB = 8.314 J K"1 mol -1 and redefining EA as the total energy per mole, to give -E A /RT in the exponential factor. Secondly, since most of the time we work under conditions of constant pressure, we need to make sure that the energy accounting is properly formulated to take account of any work terms arising from volume changes (to satisfy energy conservation, or the First Law of Thermodynamics). This is done by taking enthalpy (HA) as the appropriate energy term. Formally, the enthalpy of a system is defined: HA = UA + PV
(2)
where UA is the internal energy, comprising molecular kinetic, rotational, vibrational, and interaction energies in the system, and the pressure-volume term (PV) takes care of any energy changes due to work done on or by the surroundings. Putting these points together leads to an equivalent version of the Boltzmann probability factor: pA = wAexp(-HA/RT)
(3)
Now consider a situation where our system might also exist in another state B, say, with probability p B = w B exp(-H B /RT)
(4)
and is free to interconvert between the two. We might depict this chemically as: A^B
(5)
ALAN COOPER
228
For a large number population of molecules in the system (or for smaller numbers averaged over a period of time), the relative probability of finding the system in either state is equal to the conventional "equilibrium constant" (K) for the process K = [B]/[A] = pB/pA (where [] implies molar concentration)
(6)
Consequently, using the Boltzmann probability terms and after a little rearrangement we might write -RTln(K) = AH 0 -RT • ln(wB/wA)
(7)
where AH° = H B - HA is the (molar) enthalpy difference between the two states. This is equivalent to the classical thermodynamic expressions3 AG0 = -RT • ln(K) = AH0 - T • AS0
(8)
provided we identify AS0 = R • ln(wB/wA). In other words: 1. The "standard Gibbs Free Energy change" (AG0) is just another way of expressing the relative probability (pB/pA) of finding the system in either state. If AG0 is positive, pB/pA < 1, and state B is relatively unlikely. If AG0 is negative, pB/pA > 1, and state B is the more likely. When AG0 = 0, pA = pB, and either state is equally likely (or the equivalent—the system spends 50% of its time in either state). 2. The "standard entropy change" (AS0 ) is just an expression of the change in the different numbers of ways in which the energy of the system in a particular state may be made up. It is this latter that helps (me, at least) get a better feeling for the concept of entropy. Following Boltzmann, the absolute molar entropy of any system is given by S = R • ln(w), and it is just a way of expressing the multiplicity of ways in which the system can be found with a particular energy, sometimes called the "degeneracy" of the system. (Elementary descriptions of entropy couched in terms of "randomness" or "disorder" can be confusing or ambiguous—for example, the distribution of symbols on this page might look somewhat random to someone who cannot read, but there is really only one way (or relatively few ways) that make sense.) It is important to emphasize that the most probable (equilibrium) state of a system is determined by the Gibbs Free Energy, reflecting the relative probabilities, and that this is made up of a combination of energy (enthalpy) and entropy terms. Consequently, spontaneous processes need not necessarily involve a decrease in internal energy/enthalpy. Endothermic processes are quite feasible, indeed even common (e.g., the melting of an ice cube at room temperature) provided they involve a suitably large increase in entropy.
Thermodynamics of Protein Folding and Stability
229
The exponential nature of the Boltzmann probability expression seems to imply that low energy states are more likely and that things should tend to roll downhill to their lowest energy (enthalpy) state as they do in conventional mechanical systems. All things being equal, that is what happens thermodynamically too. However, this is generally offset by the "w" term. The higher the energy, the greater the number of different ways to distribute this energy to reach the same total. Except in special cases, the very lowest energy state of any system has all molecules totally at rest in precise locations (on lattice sites, for example) and there is generally only one way that this can be done (w = 1, S = 0). For higher energy states, however, there will be more ways in which that energy can come about—some molecules might be rotating, others vibrating, others moving around in different directions, some forming hydrogen bonds, others not, and any combination of these in multiple ways can make up the same total energy. Indeed the way in which the total energy is distributed will vary with time as a result of molecular collisions, and the higher the energy the greater the number of ways there might be of achieving it. Expressed graphically (Figure 1), the decreasing exponential energy term combined with the increasing "w" component means that the most probable, average energy of any system is not the ground state (except for T = 0 K). Heat Capacity
Both enthalpy and entropy are classical concepts related to the heat uptake or heat capacity of a system. Imagine starting with an object at absolute zero (0 K) in its lowest energy state. As we add heat energy, the temperature will rise and the molecules will begin to move around, bonds will break, and so forth. The amount of heat energy required to bring about a particular temperature increment depends on the properties of the system but is expressed in terms of the heat capacity. At constant pressure, the heat energy (dH) required to produce a temperature increment dT is given by
/ 1 w.exp(-E/kT)
Energy (E)
•
Figure 1. Graphical illustration of how the combination of exponentially decreasing Boltzmann factor combined with rapidly increasing degeneracy (w) gives an energy probability distribution of finite width peaking at energies above zero.
230
ALAN COOPER
dH = C p d T
(9)
where C is the heat capacity of the system at constant pressure. (Similar expressions are available for constant volume situations, but these are rarely encountered in biophysical experiments.) Consequently, the total enthalpy of a system in a particular state at a particular temperature is simply the integrated sum of the heat energy required to reach that state from 0 K: \ H = j C p d T + H0
(10)
0
where H 0 is the ground state energy (at 0 K) due to chemical bonding and other nonthermal effects. The magnitude of the heat capacity (C ) depends on the numbers of ways there are of distributing any added heat energy to the system and therefore is related to entropy. Consider the energy required to bring about, say, a 1 K rise in temperature. If a particular system has only relatively few ways of distributing the added energy (w small, entropy low), then relatively little energy will be required to raise the temperature and such a system would have relatively low C . If, however, there are lots of different ways in which the added energy can be spread around amongst the molecules in the system (w high, entropy high), then much more energy will be needed to bring about the same temperature increment. Such a system would have ahighC p . This is expressed in the classical (Second Law) definition of an entropy increment (at constant pressure): dS = dH/T = (C p /T) • dT
(11)
so that the total entropy of any system is given by the integrated heat capacity expression: T
S = j* (C/r) • dT
(12)
o It is these equations and the variants below, connecting both enthalpy and entropy to heat capacity measurements, that make calorimetric methods potentially so powerful in determining these quantities experimentally in an absolute, model-free manner. When defined in this way, these quantities are absolute enthalpies and entropies of the system relative to absolute zero. But we are normally interested in changes in these quantities (AH, AS) from one state to the other at constant temperature (or over a limited range of temperatures close to physiological). These follow directly from the integral expressions above:
Thermodynamics of Protein Folding and Stability
231
T
AH = HB - HA = j ACp • dT + AH(O) o I AS = S B - S A = J(AC p /T).dT
(13)
(14)
o
where AC = C B - C A is the heat capacity difference between states A and B at a given temperature. AH(O) is the ground state (0 K) enthalpy difference between A and B. Most systems are assumed to have the same (zero) entropy at absolute zero (Third Law of Thermodynamics). It is frequently convenient to relate these quantities to some standard reference temperature Tref (e.g., Tref = 298 K rather than 0 K), in which case: T
AH(T) = AH(Tref) + J ACp • dT
(15)
and T
AS(T) = AS(Tref) + \ (ACp/T) • dT
(16)
ref
This emphasizes that, if there is a finite AC between two states, then AH and AS are both temperature-dependent; this is the norm when weak, noncovalent interactions are involved, and it is particularly true for protein-folding transitions. (This effect is generally less significant—at least over limited temperature range—for conventional chemical reactions, involving covalent bond changes where large energy difference between the two chemical states are manifest even at absolute zero by differences in ground state energy.) If AC is constant independently of temperature (not necessarily true, but usually a reasonable approximation over a limited temperature range), then we can integrate the above to give approximate expressions for the temperature dependence of AH and AS with respect to some arbitrary reference temperature (Tref): AH(T) = AH(Tref) + ACp • (T - Tref)
(17)
AS(T) = AS(Tref) + ACp • ln(T/Tref)
(18)
This shows how AH and AS will both vary with temperature in the same direction. Thus, if AC is positive, both AH and AS will together increase with temperature in line with intuition—a higher enthalpy implies higher molecular energy states, broken bonds, and the like. This is consistent with higher entropy and greater
232
ALAN COOPER
degeneracy of the system. Similarly, lower entropy states are usually associated with more ordered systems with concomitantly lower enthalpy. These synchronous changes in AH and AS with temperature tend to complement and cancel each other in the AG term, so the resulting changes in AG are significantly less. For example, for small changes in temperature 8T = T - Tref, using the approximation ln(l + x) = x, for x « 1 : AH(T) = AH(Tref) + dCp -ST AS(T) = AS(Tref) + ACp • ln(l + 5T7Tref) * AS(Tref) + ACp • ST7Tref
(19) (20)
so the AC terms will partly (though not completely) cancel in AG. Moreover, over a limited temperature range for which this approximation is valid, AH(T) « AH(Tref) + Tref.[AS - AS(Tref)]
(21)
so that a plot of AH versus AS would appear linear with slope Tref. Though much could be made of the significance of such a linear correlation and of the nature of Tref as some sort of "characteristic temperature," it is simply a mathematical consequence arising from experimental data covering a limited temperature range. The Tref arising from such a correlation would simply be that temperature for which the approximation (8T small) is most appropriate, that is, somewhere in the experimental observable range. These effects are one example of the much broader phenomenon of "enthalpyentropy compensation" (Lumry and Rajender, 1970; Grunwald and Steel, 1995; Dunitz, 1995—and references therein) whereby AH and AS changes brought about by various experimental conditions (in addition to temperature) tend to move in concert in such a way as to cancel almost quantitatively in AG. Much has been made of this in terms of special solvent/water properties and so forth, but it is almost certainly just a simple manifestation of the intuitively reasonable properties of systems comprising multiple, weak, noncovalent interactions as described above— high enthalpy implies high entropy and vice versa (Weber, 1993, 1995; Dunitz, 1995). The van't Hoff Enthalpy/Equation
The temperature dependence of the equilibrium constant for any process is a manifestation of the enthalpy of the process and forms the basis for widely used methods for estimating AH. Given that -RT • InK = AH0 - T • AS0
(22)
then InK = -AH0 /RT + AS°/R and
(23)
Thermodynamics of Protein Folding and Stability
233
d(lnK)/d( 1/T) = -AH°/R
(24)
{Note: this is true whether or not AH0 and AS0 vary with temperature. In general, d(lnK)/d(l/T) = -AH°/R - (l/RT)[d(AH° )/(l/T)] + (l/R)[d(AS° )/d(l/T)] = AH°/R
(25)
since: d(AH° )/d( 1/T) = -T 2 • d(AH° )/dT = -T 2 • ACp
(26)
d(AS° )/d( 1/T) = -T 2 • d(AS° )/dT = -T 2 • ACp /T
(27)
and:
so the latter two terms cancel in the above equation.} As a consequence, a plot of experimental data of InK versus 1/T ("van't Hoff plot") gives a line whose slope at any point is the van't Hoff enthalpy (AH0 or AHVH) divided by R. In simple cases, over a limited temperature range, this plot is linear (or is assumed to be so), but in general, the temperature dependence of AH (due to AC ) will result in a curved van't Hoff plot that needs more careful analysis (Naghibi et al., 1995). In practical terms, the analysis can be made even more complicated (and such methods less satisfactory for AH determination) by the natural tendency described above for AH and AS to vary with temperature in a complementary manner so as to cancel and give relatively smaller changes in AG. What is a "van't Hoff Enthalpy"? To what does this energy refer? It is important to recognize that any van't Hoff analysis is based on a model or assumption about the process involved. Typically this will be a "two-state" model (see below) in which the equilibrium constant K is a dimensionless ratio determined usually indirectly from spectroscopic, calorimetric, or other measurements. In such a model, the molar van't Hoff enthalpy change, AHVH, is the enthalpy change per mole of cooperative unit (Sturtevant, 1974). More on this later. Thermal Energies and Fluctuations
Since all molecules are always in perpetual thermal motion (and thermodynamics is merely a consequence of this), it is useful to bear in mind the average thermal energies involved in such motion. Classical statistical thermodynamics (the "equipartition theorem") show that every independent form of motion or degree of freedom in a molecule has a mean thermal energy of !^kBT, where kB is Boltzmann's constant and T is the absolute temperature. For kinetic energy or translational motions, there are three degrees of freedom, corresponding to movement along the xyz axes, so average kinetic energy is 3kBT/2. Similarly for free rotational motion, the average energy will be about ^kBT per rotational degree offreedom.Vibrational modes have two degrees offreedomeach—one translational and one extensional—
234
ALAN COOPER
but at least for covalent bonds, the classical equipartition approximation breaks down. Quantization of vibrational levels has to be considered here, and conventional bond vibrations are rarely excited at normal temperatures. However, soft modes with frequencies of order 300cm"1 or less such as might be found in global protein vibrations, will be thermally populated at physiological temperatures. A useful rule of thumb is that the average thermal energy associated with each motional degree of freedom in a molecule is of order kBT per molecule, or RT per mole. This corresponds to about 2.5 kJ mol -1 (0.6 kcal mol -1 ) at room temperature. There is another consequence of the statistical description of thermodynamics apparent from Figure 1. As with any statistical distribution, the energy probability of any system will have a finite width and we should expect to see statistical fluctuations about the mean or most probable value. For large systems, the distribution is usually very sharp and fluctuations are not normally perceptible. But as systems get smaller, thermodynamic fluctuations get comparatively larger as in Brownian motion, for example. For very small systems such as an individual protein molecules, the thermodynamic energy and volume fluctuations can be significant and play a definite role in the dynamic functions of the protein (Cooper 1976,1984). The Two-State Approximation
Many experimental methods for estimating thermodynamic parameters for protein transitions rely on the assumption/approximation of "two-state" behavior for the system. The accuracy of the data thus obtained and the validity of their interpretation are critically dependent on the validity of this assumption. The two-state model assumes that the process of interest (or part of it) may be represented by a thermodynamic equilibrium between two experimentally distinguishable states A^B with no significant population of intermediate states and/or, equivalently, a relatively high kinetic activation barrier between them. This does not necessarily imply that A and B themselves are unique, homogeneous, static states. Consider an ice cube at 0 °C, for example. This is a classic example of a macroscopic phase transition described extremely well by the two-state approximation. The system can exist in one of two macroscopically distinguishable states: solid (ice) or liquid (water). At 0 °C and 1 atm pressure these two states can coexist, and the equilibrium can be shifted one way or the other by slight changes in temperature, pressure, or composition of the system (additives). There are no known intermediates—nothing halfway between solid and liquid. At the molecular level the ice —> water melting transition brought about, say, by increase in temperature is characterized by a breaking of (some) intermolecular hydrogen bonds and loss of regular crystal lattice structure. However, not all H-bonds are broken.
Thermodynamics of Protein Folding and Stability
235
Estimates differ and it is not even clear that the term "broken hydrogen bond" is useful for the description of interactions of water molecules in the pure liquid (Eisenberg and Kauzmann, 1969), but of order 50% remain unbroken at 0 °C. Further increase in temperature involves progressive breaking of water-water H-bonds in the liquid (until eventually they all break and we have a second two-state transition: boiling). Consequently, state B (liquid water) is not in this case a unique state but a continuum of states that merge smoothly and noncooperatively with, if we could see them, differing average structures, extent of H-bonding, and other properties. Similarly, the solid ice phase (state A) will vary with temperature: progressive changes in numbers of lattice defects, thermal disorder, vibrational amplitudes, lattice spacing (due to thermal expansion), and so forth. For example, root-mean square amplitudes of thermal vibration of atoms in ice I increase from 0.09 to 0.215 A (for O atoms) or from 0.15 to 0.25 A (for H atoms) over the -273 to 0 °C temperature range (Eisenberg and Kauzmann, 1969). In the case of proteins, A and B might be the "native" (N) and "unfolded" (U) states, respectively, and the transition may be brought about by changes in temperature, pH or denaturant concentration. The U state does not necessarily have to become random coil nor even fully unfolded during the two-state transition, and might continue to change—become "more unfolded"—as more denaturant is added, or higher temperatures reached, for example. The important experimental criterion is that there be some perceptible change in some observable property of the system that we might take as measure of the extent of the transition. For our lump of ice, this might be volume, fluidity, calorimetric enthalpy, and so forth. For a protein this might be fluorescence, UV absorbance (reflecting environmental changes of aromatic groups), circular dichroism (CD), NMR parameters, calorimetric enthalpy, or others. In any case, experimentally we would measure some quantity (F) whilst varying some parameter (x), which might be temperature, pressure, denaturant concentration, or something else, and expect to see sigmoidal variation typical of a two-state transition (Figure 2). The pre- and posttransition baseline slopes reflect the earlier argument that the properties of A and B themselves are expected to vary with x. After suitable correction for this, usually by linear extrapolation, the two-state assumption allows us to estimate the (apparent) equilibrium constant (K ) as a function of x: Kapp(x) = [B]/[A] = (F-F 0 )/(F i n f -F) where the square brackets [] indicate molar concentrations (strictly activities), F is the observed quantity, and F 0 and Finf are the (extrapolated) values at low and high x values representing pure A (N) or pure B (U) states respectively. If the experimental variable is the temperature (T), then such data, giving K as a function of T, may be used to estimate the van't Hoff enthalpy change (AHVH) for the transition.
236
ALAN COOPER
I
I
,
I
.
L
Variable (x) Figure 2. Illustration of sigmoidal variation of an experimental observable (F) with changing parameter (x) for a two-state transition, including pre- and posttransition baseline slopes.
It is pertinent to consider again what is meant by the "van't Hoff enthalpy" in these circumstances and, in particular, how it depends on the size of the system undergoing the two-state transition. Note that K is a dimensionless quantity, and that we do not normally need to know the absolute concentrations of A and B in order to determine it—simply the ratio of appropriate F values is sufficient. Yet AHVH has the units kJ per mole (or equivalent, i.e., the units of R in the van't Hoff equation). Per mole of what, we may ask? Well, it is per mole of whatever is undergoing the two-state transition, or per mole of the "cooperative unit/' This depends on the size of the system. For our block of ice, this would be the enthalpy change for a mole of (identical) ice cubes—since it is the whole ice cube that melts cooperatively. For a protein molecule (or, more strictly, a solution of protein molecules), we might anticipate the cooperative unit to be just the molecule itself since, although individual molecules might unfold cooperatively, the behavior of separate molecules is uncorrelated.
Thermodynamics of Protein Folding and Stability
237
This is illustrated in Figure 3 showing the (sigmoidal) transition with increasing temperature expected for two-state van't Hoff behavior for ice compared to a typical protein with AHVH = 400 kJ mol 1 . For a 1-cm cube of ice, the enthalpy (heat) of melting is about 300 J, which corresponds to 6 kJ mol - 1 , and the resulting transition is extremely sharp. Contrast this with the melting of a (hypothetical) 20 A cube of ice, about the same size as a protein molecule. Ignoring surface effects, this would require about 2.7 x 10~18 J (6.4 x 10"19 cal) to melt at 0 °C or 1600 kJ mol"1 (380 kcal mol"1) and would give the sigmoidal melting profile shown in Figure 3. Note that this is still a two-state transition. There is no suggestion that the mini-ice cube is at any stage "halfmelted"—that is, intermediate between liquid and solid. It simply shows that, for small systems, there is a finite range of temperatures over which significant populations of either state may be observed. The bigger the system, the more the cooperativity, the sharper the transition becomes until, for everyday macroscopic objects, the transition region is so narrow as to be imperceptible and for the transition to appear infinitely sharp. This shows that, in the limit, even the most ideal, perfectly cooperative two-state protein transition will have a finite width determined solely by thermodynamic constraints.
b
0.6 h
Figure 3. Sigmoidal van't Hoff transition curves showing fractional extent of the transition (F) versus temperature (T) for: (A) a hypothetical ice cube, 20 A per side; (B) a typical protein molecule unfolding at 40 °C with A H V H « 400 kJ m o l - 1 (~ 100 kcal mol - 1 ).
238
ALAN COOPER
THERMODYNAMICS OF UNFOLDING: REVERSIBLE GLOBULAR PROTEINS Differential Scanning Calorimetry
Unfolding of proteins at elevated temperatures can be followed by a variety of indirect methods which, using the two-state approximate analysis described above, can give information about thermodynamic parameters for the process. Much less ambiguous information, however, is given by calorimetric methods that measure energy changes directly. Differential scanning calorimetry (DSC), pioneered and developed for biomolecular studies by the Sturtevant (1974, 1987), Brandts (Jackson and Brandts, 1970), and Privalov (Privalov and Potekhin, 1986) groups is most
1
'
1
»
I
•
»
I
I
1
'
1
'
1
»"
»
1
»
1
I
60 h
40 h
20 h
0h 40
50
-L 60 70 Temperature (°C)
80
Figure 4. Typical DSC data for thermal unfolding of a globular protein. (A) Raw data—lysozyme, 3.7 mg/ml (0.26 mM) in 40 mM glycine/HCI buffer, pH 3.0, scan rate 60 °C hr"1. (B) Buffer baseline control run under identical conditions. (C) Concentration normalized C data with control baseline subtracted.
Thermodynamics of Protein Folding and Stability
applicable here. In a DSC experiment, a solution of protein (typically 1 mg/ml or less in modern instruments) is heated at constant rate in the calorimeter cell alongside an identical reference cell-containing buffer. Differences in heat energy uptake between the sample and reference cells required to maintain equal temperature correspond to differences in apparent heat capacity, and it is these differences in heat capacity that give direct information about the energetics of thermallyinduced processes in the sample. A typical DSC thermogram for the unfolding of a simple globular protein is shown in Figure 4. Note that, at most times, the heat capacity of the protein solution is lower than the control with buffer alone. This reflects the fact that protein, in common with most organic substances, has a lower heat capacity than liquid water. (Water is, of course, the unusual partner here since the special features of its extended H-bonded structure endow it with a range of anomalous physical properties, including an unusually high heat capacity.) After correction (by subtraction) of the buffer baseline control, three significant regions are apparent in this DSC trace. At low temperatures ("pretransition") the heat capacity of the protein increases monotonically with temperature in a manner typical of organic solids. As the protein begins to unfold at higher temperatures, the DSC trace becomes more positive showing the increased apparent heat capacity arising from heat energy uptake in the endothermic unfolding transition. Once this transition is complete, the thermogram reverts to a "posttransition" baseline, reflecting the heat capacity of the now-unfolded protein in solution. This posttransition baseline is characteristically offset from the extrapolated pretransition heat capacity, indicating a positive AC , and is usually flatter. The shape and area of the transition endotherm contain thermodynamic information about the process. Most directly, the integrated area beneath the peak in the DSC endotherm divided by the total amount of protein in the calorimeter cell gives the calorimetric enthalpy (heat uptake, AHcal ) for the unfolding transition independently of any model assumptions (apart from interpolation of pre- and posttransition baselines). Depending on how the protein concentration is measured, this might be quoted per mole or per gram of protein. The midpoint temperature of the transition (Tm) is the point at which 50% (on the average) of the protein molecules are unfolded and that, in simple cases, is the temperature at which the Gibbs free energy of unfolding (AGunf) is zero. Uniquely to DSCy a second and independent estimate of the unfolding enthalpy may be made from van't Hoff analysis of the shape of the peak in the DSC thermogram (Jackson and Brandts, 1970; Sturtevant, 1974, 1987; Privalov and Khechinashvili, 1974; Hu et al., 1992). Assuming a two-state transition model, the fractional heat uptake at any stage in the transition may be taken as a measure of the extent of unfolding and, as such, may be used just like any other (indirect) observable parameter to plot the fraction unfolded as a function of temperature. This fraction is an empirical quantity, independent of the sample concentration or
239
ALAN COOPER
240
absolute calorimetric enthalpy and may be used as described earlier to estimate the van't Hoff enthalpy (AHVH ) of the process. This is the heat uptake per mole of cooperative unit in the transition, and comparison with the directly-determined calorimetric enthalpy (AH^j) gives information about the size of the cooperative unit or the validity of the two-state assumption. For an ideal, cooperative two-state transition, AHVH = AHcal, and this holds reasonably well (within 5%) for experiments involving small, simple globular proteins under conditions where their unfolding transition is reversible. Frequently, however, this is not the case (Hu et al., 1992). Situations can arise where AHVH > AHcal, reflecting a DSC transition that is narrower than would be expected. This might indicate that the cooperative unit is greater than anticipated due to specific dimer or higher oligomer formation, for example, in which cases the AHVH :AHcal ratio is an indication of the number of protein molecules involved in the cooperative unfolding unit. Care must be exercised here, however, since anomalous sharpening or foreshortening of DSC peaks can (and frequently does) arise from irreversible processes such as exothermic aggregation of unfolded protein. Such effects can also have a kinetic component that will show up as a scan-rate dependence of the transition peak shape and position (Sanchez-Ruiz et al., 1988; Galisteo et al., 1991; Lepock et al., 1992). The opposite situation, AHVH < AHcal, arises when the DSC transition is broader than would be expected for a two-state transition with this particular AHcal. This usually reflects a breakdown of the simple two-state model assumption, indicating that unfolding of the protein involves several steps with at least one significantly populated intermediate phase. In some cases, the thermogram might display clear shoulders or separate peaks that can be deconvoluted and correlated with the (possibly independent) unfolding of recognizable domains or subunits of the protein under investigation (Privalov, 1982). Thermodynamics of Unfolding: Empirical Data
The DSC transitions of a range of small, monomeric globular proteins, including examples such as lysozyme, ribonuclease, myoglobin, cytochrome c, chymotrypsin and ubiquitin, have been extensively studied over the past 20 to 30 years as instrumental techniques have developed, and a consensus view is now appearing— at least for these relatively well-behaved proteins. Under most experimental conditions, the thermal unfolding transitions of these proteins seem to follow cooperative two-state behavior well enough for us to ignore any significant build-up of intermediate states in the transition (Jackson and Brandts, 1970; Privalov, 1979). Calorimetric (AHcal) and van't Hoff enthalpies (AHVH) are close to identical within experimental error (i.e., ±5%), which is within the usual uncertainties associated with protein concentration measurements that are crucial to absolute molar AHcal estimates. (Sometimes the AHVH :AHcal ratios are consistently slightly greater than one, possibly reflecting systematic errors in concentration measurement.)
Thermodynamics of Protein Folding and Stability
241
As has been apparent for many years from a range of experimental methods including DSC, folded proteins are, in terms of thermodynamic free energy, only marginally stable with respect to their unfolded states. The experimental free energy difference (AGunf) between folded and unfolded states under near-physiological conditions is usually in the range +20-60 U mol"1 (the positive sign reflecting the stability of the native fold). This corresponds to a stabilizing free energy per amino acid residue that is much less than average thermal energy under these conditions (kBT = 2.5 kJ mol"1 at 300 K) and emphasizes the cooperative nature of proteinfolding (for example, Privalov, 1982, 1992; Murphy and Freire, 1992; Chan et al., 1995): individually, the interactions between amino acids are insufficient to maintain a stable conformation, but taken together in concert they are. For example, with a 100-residue protein, an average value AGunf = 40 kJ mol-1 corresponds to a two-state equilibrium constant (K) of about 10"7 at 25 °C implying that only one molecule in 10 million is cooperatively unfolded at any one time under these conditions. If, on the other hand, the polypeptide was able to "unravel" one or two residues at a time, the low free energy per residue (=0.4 kJ mol -1 ) would allow a significant amount of such unravelling. Presumably it is the strict topological or stereochemical constraints of the folded protein that usually do not allow such unconcerted actions—like a three-dimensional jigsaw or "Chinese puzzle," where the removal of just one piece is impossible without disrupting the whole. The temperature dependence of AGunf shows that for most proteins the folded form is, not unreasonably perhaps, most stable in the physiological temperature range (see Figure 5). Variation of AGunf with temperature is normally relatively small in the 20-40 °C region, but with significant curvature as AGunf falls to negative values at higher temperatures where the unfolded form becomes the more stable. The midpoint unfolding temperature (T m ) is given by the point at which this curve crosses the AGunf = 0 axis. The relatively small free energy of unfolding is usually made up of much larger and much more temperature-dependent enthalpy and entropy contributions. Unfolding is usually endothermic (but not always—see below), with a typical A H ^ of order +1 kJ mol-1 per residue at 25 °C but varies rapidly and becomes increasingly more positive (more endothermic) with temperature. This positive A H ^ is offset by a (usually) positive entropic contribution, ASunf, typically of order +2 J K"1 mol"1 per residue at 25 °C but also increases rapidly with temperature (Figure 5). This strong temperature dependence of A H ^ and ASunf is a consequence of the heat capacity differences, AC , between folded and unfolded states. The heat capacity of the unfolded polypeptide chain, obtained by extrapolation of posttransition DSC baselines or from measurements on chemically unfolded samples (Privalov and Makhatadze, 1990), is higher than that of the folded protein (Figure 4). For the unfolded protein, the heat capacity appears to show relatively little variation with temperature, unlike the folded state where C generally increases
242
ALAN COOPER 0
20
40
60
0
20
40
60
20
0 -20 -40
a 3 600 400 200 0 T (°C)
Figure 5. Characteristic temperature variation of thermodynamic parameters for unfolding of a small globular protein. Data are calculated for a typical protein unfolding at 40 °C (Tm) with AH m = 300 kj mol" 1 and assuming a constant AC = 9 kj K~ mol~ . Note how the relatively small unfolding free energy (AG unf ) is made up of the difference between relatively large enthalpic (AHunf) and entropic (ASunf) contributions. Temperature variation of AC would show as a curvature of the AH unf and TASunf lines.
with T (Jackson and Brandts, 1970; Brandts and Lin, 1990). As a consequence, AC itself also varies with temperature, becoming smaller at higher temperatures. A word of caution regarding experimental AC estimates (Hu et al., 1992). Although in principle, the ACp for a protein-unfolding transition may be obtained from the difference between extrapolated pre- and posttransition baselines in a single DSC experiment, in practice for most instruments, the baselines are not well-enough defined nor do they extend over a sufficient temperature range to assure confident extrapolation. Consequently an alternative experimental procedure is frequently adopted in which the Tm of the protein under investigation is varied in separate experiments, usually by variation of experimental pH. Analysis of the variation in A H ^ with Tm (essentially the slope of the A H ^ versus Tm plot) gives AC . In cases where comparison can be made, this approach gives AC values consistent with those measured directly from heat capacity baseline extrapolations, but one must remember that it may be that different transitions are being observed
Thermodynamics of Protein Folding and Stability
243
under these differing experimental conditions and this might make additional contributions to A H ^ and, therefore, affect the apparent AC . Experiments done at different pH values, for example, will involve unfolding of differently ionized (charged) forms of the protein. It is unclear, at least at first sight, to what extent this will affect the measured heats or AC values. But comparison of the heats of unfolding of lysozyme at different temperatures by variation in both pH and denaturant (guanidinium chloride) concentrations (Privalov, 1979, 1992; Pfeil and Privalov, 1976a,b,c) indicate that the unfolding enthalpy (for lysozyme, at least) is a function only of the temperature and not of how the unfolding is brought about. Consequently, Privalov(1992) has argued that AC values determined in this way should be valid. However, the observation that A H ^ depends only on temperature and not on pH or denaturant concentration is somewhat unexpected and would imply that the reduction in stability of the folded protein by pH or denaturants is simply an entropic effect. Cold Denaturation
One significant consequence of a finite positive AC for the unfolding process is that the plot of AGunf versus temperature is curved (Figure 5), decreasing either side of some intermediate temperature of maximum stability. At higher temperatures, AGunf eventually becomes negative, describing endothermic thermal unfolding (above). But similar extrapolation on the low-temperature side suggests that, at some sufficiently low temperature, AGunf should also be negative, which suggests that the unfolded protein should also become the more thermodynamically stable state at low temperature. This led to the prediction of exothermic "cold denaturation" of proteins (Brandts, 1964; Franks, 1995) and was widely accepted as evidence for the dominant involvement of hydrophobic interactions in folding stability since empirically the solubility of nonpolar compounds in water is enhanced at lower temperatures. For most proteins under normal conditions, the extrapolated temperature required for cold denaturation is below the freezing point of water, and different factors are expected to affect folding stability of proteins in a frozen matrix. But cold denaturation has been observed in a few instances, usually by addition of salts to depress the freezing point of the sample or by addition of denaturants that reduce the stability of the folded protein so that cold denaturation occurs at higher temperatures, above 0 °C. Calorimetric experiments on cold denaturation are technically quite difficult, but the limited amount of information gained so far suggests that cold denaturation behaves like a cooperative unfolding transition, with thermodynamic parameters consistent with estimated extrapolations from hightemperature unfolding data (Privalov, 1990). Thermodynamics of Unfolding: The Molecular Interpretation
Although the experimental situation regarding protein-folding thermodynamics is now fairly well-established, the interpretation of the thermodynamic parameters
244
ALAN COOPER
at the molecular level has been and remains much more controversial. Despite numerous reviews that have appeared in recent years, in addition to the classic Kauzmann (1959) article that first gave prominence to hydrophobic interactions, no clear picture has yet emerged. Particularly contentious has been interpretation of the temperature-dependence of the unfolding enthalpies and entropies (AH^ and ASunf), where much has been made of the supposedly unusual or special "convergence" temperature(s) (usually in the region of 110 °C ) at which extrapolated ASunf and AH^, when expressed per mole of amino-acid residue, were thought to achieve similar values for different proteins (Privalov, 1979; Baldwin, 1986; Privalov and Gill, 1988; Murphy et al., 1990; Lee, 1991). It is now acknowledged that much of this speculation was based on over-interpretation of limited data from DSC experiments on a small set of proteins (Makhatadze and Privalov, 1995). More comprehensive analysis of accumulated, more accurate data from an extended range of globular proteins allows a more rational overview. Folding of a protein must overcome the thermodynamically unfavorable loss of conformational entropy associated with the dynamic heterogeneity of the conformationally disordered polypeptide in the unfolded state. Various estimates of this entropy have been made, both from theoretical considerations of the statistics of random coil polypeptides and extracted from experimental data (Schellman, 1955; Privalov, 1979; Brooks et al., 1988). Values range from 15 to 25 J K"1 mol"1 per residue arising from backbone conformational freedom (<)>-\j/ rotations, etc.), with additional contributions arising from restriction in side-chain conformational mobility (Doig and Sternberg, 1995). This corresponds to a free energy (TAS) of order 6 kJ mol"1 or more per residue that must be overcome by a net negative contribution from changes in interactions between protein and solvent groups, either separately or collectively, in the folding process. The fundamental problem in interpreting protein-folding thermodynamics in terms of the individual molecular interactions between groups in the protein is, of course, that such interactions always involve differences between two states—typically the difference between a group exposed to solvent (water) in the unfolded protein and one buried in the folded form. It is the unavoidable involvement of solvent interactions, and particularly such a complex solvent as water, that makes analysis so difficult. Take, for example, the hydrogen bond interaction between two protein groups: the NH-0=C bond between peptide units, say. Such bonds are easily recognized in X-ray diffraction structures of proteins, and it is tempting to assume that they stabilize the structure. However, although H-bonds between buried peptide and other groups undisputedly stabilize the particular protein fold, it is even yet unclear to what extent they contribute to the overall stability with respect to the unfolded state. This is because in the unfolded protein, the -NH and -C=0 bonds (say) are presumably solvated (H-bonded to water molecules). During the folding process, the H-bonds to water must be broken and then replaced by the intramolecular bonds. Hence, in the overall process, taking solvent interactions into account as we must, there is no net gain in number of hydrogen bonds in the system, though
Thermodynamics of Protein Folding and Stability
245
there will be entropic contributions arisingfromrelease of bound water that are less easy to visualize. Experiments with small model compounds seem to support this general picture (Klotz and Farnham, 1968; Kreshek and Klotz, 1969). Indeed, the ubiquitous high solubility of polar, H-bonding compounds in water shows that most groups "prefer" hydrogen bonding to water than to other groups—to the extent that model studies usually suggest that H-bonds within groups in proteins make an overall destabilizing contribution to the free energy of folding. That is, although it is energetically unfavorable to leave any H-bonds broken, it is relatively immaterial whether the H-bond is to a water molecule or to another protein group. So, when a protein folds, although all possible hydrogen bonds are probably made, their contribution to the folding free energy may be negligible or even repulsive. But other interpretations are possible (e.g., Dill, 1990a; Spolar et al., 1992). Similar problems afflict interpretation in terms of the other general kinds of interaction (electrostatic, hydrophobic, van der Waals) usually considered. Some aspects of electrostatic interactions are also considered further below. Such considerations are usually based on analogies with small organic molecules in solid, liquid, vapor or solution states, and some success has been achieved in correlating thermodynamic parameters with changes in accessible surface areas of polar and nonpolar groups on folding (Spolar et al., 1992). But the problem with small molecule model studies as analogues of the protein-folding process is that such models rarely, if ever, mimic the detailed changes that occur between folded and unfolded proteins. The importance may lie in the detail—to the extent that the best model systems may be the proteins themselves. The complexities of the interpretative problem, and the ferocity of the arguments involved, are illustrated in two recent articles in the same volume of Advances in Protein Chemistry (Lazaridis et al., 1995; Makhatadze and Privalov, 1995; see, in particular, the epilogues to these chapters) as well as elsewhere (Makhatadze and Privalov, 1996). Continuing a sequence of papers from this group, Makhatadze and Privalov (1995) present a detailed comparison of the published thermodynamic data from a range of proteins in comparison with their folded structures and have attempted to dissect the interactions into their component parts to identify features characteristic of the different contributions. Their argument is too detailed to reproduce here, but in summary they conclude, somewhat surprisingly, that the dominant contribution to the stability of the compact folded protein comes from internal hydrogen bonding and, to a lesser extent, from the van der Waals attractions between closely packed groups within the protein. (Creighton, 1991, came to similar conclusions.) Little contribution appears to come from hydration of aliphatic groups, and burial of aromatic residues appears to be thermodynamically unfavorable in contrast to received wisdom (Kauzmann, 1959; Dill, 1990a). However, identification of the classic "hydrophobic effect" contribution within this scheme is difficult since Makhatadze and Privalov treat hydration and van der Waals contributions separately and in a way that makes comparison with other models less straightforward.
246
ALAN COOPER
The numerical self-consistency of the Makhatadze and Privalov (1995) analysis is impressive. But it has to be said that the work is based on numbers extracted or extrapolated from published experimental data that appear, at least in some instances, to be more precise than the original raw experimental data or published figures would justify. It is also fair to say that equally convincing numerical correlations have appeared in the past based on similar data but with different parameters (for example: Makhatadze and Privalov, 1993; Privalov and Makhatadze, 1990, 1993; Khechinashvili et al., 1995). This probably reflects the sparsity of experimental data compared to the number of free parameters in any model. In marked contrast, in the same volume of Advances in Protein Chemistry, Lazaridis and colleagues (1995), taking a different approach with a less extensive set of experimental data, come to markedly different conclusions regarding both the magnitudes and the signs of the different contributions to stability, supporting the more traditional view (Kauzmann, 1959; Dill, 1990a) that hydrophobic interactions are the primary source of folding stability and hydrogen bonding is the source of specificity of the folded conformation. They also point out some shortcomings in the Makhatadze and Privalov approach that might lead to overestimation of H-bonded contributions, for example. However, Laziridis and colleagues (1995) address only the enthalpic contributions to folding at one temperature, and it is not clear in this treatment where the important temperature dependence of enthalpies (AC ) arises. Nor have they yet considered the much more difficult but equally important entropic terms. Much of the disagreement between different models is often semantic arising from different ways in which different workers elect to partition different contributions under different headings (Dill, 1990b; Privalov et al., 1990, for example). Indeed, this desire to partition between different kinds of interaction may itself be flawed since, although it is understandable and would make contemplation of the problem easier, the various interactions are really in some ways just different manifestations of the same overall phenomena, and they cannot necessarily be separated into individual, independent components. Hydrophobic interactions, for example, are just a manifestation of the hydrogen bonding properties of water. These same hydrogen bonds are responsible for the solvation of charged and polar groups that dominates the overall thermodynamics of H-bond formation in proteinfolding. Hydrogen bonds themselves are just a convenient construct: a way of visualizing a particular subset from a larger class of polar interactions arising from permanent dipole/multipole effects. Additionally, all these interactions occur over a background of the unavoidable van der Waals interactions, with attractions arising from transient quantum mechanical charge fluctuations (London dispersion forces) and repulsions from the too close approach of atoms. It is also probably a significant oversimplification to assume that all these interactions are necessarily additive, especially in such a cooperative structure as a folded protein.
Thermodynamics of Protein Folding and Stability
247
Perhaps we are asking rather too much at present when attempting detailed molecular interpretation of the empirical thermodynamic data. Even much simpler systems defy such analysis. The melting of a simple organic solid, for example, is not understood in the same detail that we seem to be demanding for protein unfolding. The reason for this is instructive. Provided the crystal structure is known in sufficient detail, the intermolecular forces between small molecules in the solid can be computed relatively easily; this, in fact, forms the basis for many of the empirical force fields used in molecular mechanics calculations on proteins. But once the crystal melts, we are in unknown territory. So little is known about the structure and dynamics of liquids at the molecular level that it is, as yet, impossible to calculate ab initio thermodynamic parameters (H, S, C ) with sufficient confidence to estimate or even rationalize the crucial thermodynamic parameters (AH, AS, AC ) for the melting phase transition. Compare this now with the protein folding situation. Even though it might be possible to obtain relatively good estimates of the energy of the folded polypeptide, it is the disordered, unfolded state that creates major difficulties. Not only do we have insufficient experimental data to characterize the population of conformational states that defines the unfolded protein, but each of these conformational states comprises a heterogeneous mixture of different molecular groups immersed in water, which is a complicated enough molecular liquid in its own right. Interestingly, it is differences in assumptions regarding the nature of the unfolded polypeptide that lead, at least in part, to divergences in interpretation between Lazaridis and colleagues (1995), Makhatadze and Privalov (1995), and others. In such circumstances, it is probably wise to regard with some circumspection detailed theoretical descriptions of the thermodynamic contributions to protein folding.
EFFECT OF LIGAND-BINDING ON FOLDING THERMODYNAMICS Le Chatelier's principle implies that if any ligand (small molecule or other protein or macromolecule) binds preferentially to the folded protein, then this will stabilize the folded state and unfolding will become progressively less favorable as ligand concentration increases. Conversely, ligands that bind preferentially to the unfolded protein will destabilize the fold and will encourage unfolding. Examples of both are seen (Sturtevant, 1987; Fukada et al., 1983; Cooper, 1992; Cooper and McAuley-Hecht, 1993). The general case of multiple ligands and multiple protein subunits has been considered by Sturtevant (Fukada et al. 1983; Sturtevant, 1987). For a simple case in which a ligand molecule (L) binds specifically only to the native folded protein (N), the following equilibria apply: Ligand binding: N+L^NL;
K LN = [N][L]/[NL]
(28)
ALAN COOPER
248
Unfolding: N^U;
K0 = [U]/[N]
(29)
where KL N is the dissociation constant for ligand binding to the native protein and KQ is the unfolding equilibrium constant for the unliganded protein. In the presence of ligand, the effective unfolding equilibrium constant ( K ^ ) is given by the ratio of the total concentrations of unfolded to folded species: K ^ = [U]/([N] +[NL]) = K(1 + [L]/KLN) = KQ . KLN /[L]
(30)
where the approximate form holds at high free ligand concentrations ([L] > K LN ). This shows that K ^ decreases and the folded form becomes more stable with increasing ligand concentration. Expressed in free energy terms: AGunf = -RT • lntK^) = AGunf>0 + RT • ln(l + [L]/KLN) * AGunf 0 + AG° ss N + RT • ln[L] (for high [L])
(31)
where AGunf 0 is the unfolding free energy of the unliganded protein, and AG^iss N = -RT • ln(KL N) is the standard Gibbs free energy for dissociation of the ligand from its binding site on the native protein. Thus the stabilizing effect of bound ligand can be visualized as arising from the additional free energy required to remove the ligand prior to unfolding together with an additional contribution (RT • ln[L]) from the entropy of mixing of the freed ligand with the bulk solvent. In the high ligand concentration limit, the free energy can be separated into enthalpy and entropy contributions, thus A H ^ A H ^ O + AH^N
02)
and ASunf = AS unf0 + AS°SS.N - R • ln[L]
(33)
For small ligands the heat of dissociation (AH^iss N) can be quite small compared to the heat of unfolding of the protein and may be hard to distinguish in calorimetric unfolding experiments, particularly when AHunf is in any case varying with temperature due to AC effects. Entropy effects, particularly those arising from the ligand mixing term (R • ln[L]), will be much more apparent in such cases. (Slightly more complex, but manageable expressions, corrected for the fraction of unliganded protein in the mixture, apply at lower concentrations of ligand. In such cases, the thermodynamic parameters have values intermediate between unliganded and fully-liganded values given above.) Similar considerations apply in situations where ligand binds only to the unfolded protein (Cooper, 1992; Cooper and McAuley-Hecht, 1993):
Thermodynamics of Protein Folding and Stability T
•
249 |
75.0
i
50.0
25.0 X
w 0.0 J
0
i 20
i
i i i i 40 60 Temperature (°C)
i 80
i
LJ
100
Figure 6. DSC traces showing the effect of increasing oc-cyclodextrin concentrations (0-15% w/v) on the thermal unfolding of lysozyme (40mM glycine buffer, pH 3.0). Note the progressive reduction in both Tm and apparent AC .
U +L^UL;
KLU = [U][L]/[UL]
(34)
in which case K ^ = ([U] + [UL])/[N] = Ko • (1 + [L]/KLU) « K0 • [L]/KLU
(35)
and AGunf = -RT • InOC^) = AGunf,0 -RT • ln(l + [L]/Kuu) - AGunf0 - AG° s s U - RT • ln[L] (for high [L])
(36)
which in this case shows the destabilizing effect of a reduction in unfolding free energy as ligand binds to the unfolded polypeptide. Equivalent expressions for the enthalpy and entropy contributions may be written as above with appropriate sign changes. An example of this kind of effect is illustrated in Figure 6 for the unfolding of globular proteins in the presence of cyclodextrins. These toroidal oligosaccharide molecules form inclusion complexes with small nonpolar molecules and therefore bind to exposed aromatic groups on the unfolded protein (Cooper, 1992). (Note: the apparent variation in AHcal is predominantly due to the inherent variation of unfolding enthalpy with temperature (AC effect) rather than the result of ligand binding per se.) The effect of ligand binding (either to N or U) on Tm of the protein can be generalized and approximated in the case of weakly binding ligands (Cooper and McAuley-Hecht, 1993) to give
ALAN COOPER
250
ATm /T m = ±(nRTm0 / A H ^ ) • ln(l + [L]/KL)
(37)
where ATm = Tm - T m0 is the shift in unfolding transition temperature and n is the number of ligand binding sites on the protein (assumed identical). The ± sign relates to whether ligand stabilizes the folded or unfolded form. At low concentrations, with weakly binding ligands ([L]/KL « 1) this becomes approximately linear in ligand concentration: ATm nm = ± nRT^ [L]/(KL • A H ^ )
(38)
Note that the Tm shift continues with increasing ligand concentration even beyond levels where the protein is fully ligand-bound. This is a manifestation of the dominant entropy of mixing contribution described above. Cases do arise, however, where the Tm shift does plateau at higher ligand concentrations. This usually signifies binding of L to both N and U, albeit with different affinities. For example, a particular ligand might bind strongly to the native protein but less well to the unfolded chain. In such cases, the Tm would shift upwards with increasing [L] until the concentration is such that both N and U are fully liganded. A recent example of this is a-lactalbumin (Robertson, Cooper, and Creighton in preparation), a specific calcium binding protein where increasing [Ca2+] increasingly stabilizes the native protein up to a limit where weak, nonspecific calcium ion binding to the unfolded chain sets in. Analysis of more complex situations involving multiple ligand binding or more tightly binding ligands is generally less straightforward, but the same basic principles apply. (For details see Sturtevant, 1987; Brandts and Lin 1990.) Effect of pH The effect of varying pH on the stability of protein folding is just a special case of the ligand-binding consequences described above. In this case, the ligands are aqueous hydrogen ions (H+) that bind to specific protein sites (acidic or basic groups) in both folded and unfolded states. Only if the proton binding affinities differ between the two states will pH have any effect on stability. Consider, for simplicity, the proton binding to a single group on the polypeptide. The acid-base equilibrium for folded and unfolded states may be described: N+H+^NH+;
K AN = [N][H+]/[NH+]
(39)
U + H+ ^ UH + ; KAfU = [U][H+]/[UH+]
(40)
The apparent or effective equilibrium constant for protein unfolding in this case is given by: (41) K ^ = ([U] + [UH+])/([N] + [NH+]) = Ko • (1 + [H + ]/K AU )/ (1 + [H+]/KAN)
Thermodynamics of Protein Folding and Stability
251
where KQ = [U]/[N] is the unfolding equilibrium constant for the unprotonated species. It follows from this that the stability of the folded protein (with respect to unfolded) can only be affected by changes in pH if KA N * KAV . The pH-dependence in more realistic situations with multiple ionizable groups is somewhat more complex, but the general principle still applies that changes in pH can only affect folding stability if the ionisable group(s) have different pKA values in the folded and unfolded states. It also follows from the above that, in regions where the stability of the folded protein is sensitive to pH, the folding <-» unfolding transition must be accompanied by an uptake or release of hydrogen ions. Using the general theory of linked thermodynamic functions (Wyman, 1964; Wyman and Gill, 1990), the mean change in number of H + ions bound when the protein unfolds is given by Soa^-dlogf^/dpK
(42)
Shifts in pK (5pK) correspond to changes in standard free energy of proton ionization of the group (8AG°ion) which are numerically related by 5AG
L = -2.303RT • 5pK
(43)
where R is the universal gas constant (8.314 J K"1 mol -1 ) and T the absolute temperature. This corresponds to almost 6 kJ mol -1 per unit shift in pK at room temperature. Figure 7 gives an illustration of the effect of pH on the thermal unfolding of a simple globular protein (lysozyme) as seen in DSC experiments. The major change in Tm in the low-pH region occurs over the pH 2 to 3 range; this is consistent with protonation of carboxylate side chains, and the variation corresponds to a maximal uptake (5nH+) of about three hydrogen ions during unfolding of this protein under these conditions. This does not, of course, imply that there are three specific titrating groups responsible for this behavior but rather that this is the cumulative effect of all participating groups. Electrostatic Interactions Changes in group pKA can be brought about by variations in effective polarity (dielectric constant) of the environment as a result of burial of residues within the folded protein for example, or by electrostatic interactions with other charged groups (Stigter and Dill, 1990; Yang et al., 1993; Antosiewicz et al., 1994; and references therein). All these factors are likely to change when a protein unfolds, so it is not unexpected that pKA's might be different between the two states. In some cases, the pKA shifts can be quite large—3 to 4 pK units, for example in specific instances involving short-range electrostatic interactions or burial in nonpolar locations, usually for residues with important catalytic or other specific functions. But generally the pK shifts for most residues are much smaller than this, since most
ALAN COOPER
252
?
c
I
40
I
I
H -20
60
Temperature (°C)
Figure 7. Effect of pH on the thermal unfolding of lysozyme in the DSC. The insert shows the variation in Tm with pH for this protein.
charged groups are usually found close to the outer surface of the folded protein, and only relatively small changes in electrical environment occur on unfolding. Nevertheless, the accumulation of small pKA shifts from a large number of such groups will make a considerable contribution, and the folding stability of most proteins is therefore sensitive to pH. Exact calculation of electrostatic properties of proteins is a complex and computationally intensive problem (Stigter and Dill, 1990; Yang et al., 1993; Antosiewicz et al., 1994). Simple Coulomb interaction models can, however, give an interesting and perhaps somewhat unexpected view of the complexity of the thermodynamics of charged groups in proteins. For example, assuming point charges and a uniform dielectric medium, the electrostatic free energy (5Gel) between two charges, qx and q2, with a distance R12 between them is given by the classic Coulomb energy: q^^Tt^eR^, where e0 is the permittivity of free space and e is the relative dielectric constant of the medium around the charges. This can be viewed as the work done and thus free energy change in bringing these charges together from infinity to a separation R12. For singly-charged groups and with R12 expressed in Angstrom units (A) this can be written: 8Gel = ±1380/eR 12 kJmol- l
(44)
Thermodynamics of Protein Folding and Stability
253
where the ± sign depends on whether interactions are attractive (opposite charges, negative 5Gel), or repulsive (like charges, positive 8Gel). For charged groups separated by, say, 5 A, this amounts to about 3.5 kJ mol"1 in water at 25 °C with a dielectric constant of about 80 and corresponds to a combined pK shift of about 0.6 pK units. However, in a much lower dielectric environment such as the interior of a protein (e « 2.5 to 4; Gilson and Honig, 1986) this can rise to 5Gel ~ 100 kJ mol -1 and (probably unrealistically) a combined 5pK in excess of 12. Burial of individual charged groups within the nonpolar environment of a folded protein is generally energetically unfavorable. Again assuming a continuous dielectric, the free energy of transfer of a single spherical charge (q) of radius r from medium 1 to medium 2 is given by 5G
trans = ^ ( 1 / £ 2 - ^i)^^oT
= 690(l/e 2 - l/e^/r kJ mol"1
(45)
with r in A for a single charge in the latter case. Taking a representative atomic radius (r = 2 A) with £j = 80 and e2 = 4, which might be typical for burial of a group within a protein, this gives SG^^ « 80 kJ mol -1 . Calculations such as these are simplistic: the continuum dielectric model is unrealistic at the atomic level, and we have ignored screening and other effects due to buffer electrolytes, for example. Nevertheless, they do illustrate the potential importance of charge interactions to folding stability, and these are the sorts of numbers that come out of more rigorous calculations and from experiment (e.g., Dao-pin et al., 1991). The partitioning of these electrostatic free energies into enthalpy and entropy components is also complicated. For any given geometry, the temperature dependence of the electrostatic free energies will depend on the temperature dependence of e. Interestingly, since dielectric constants generally decrease with increasing temperature, at least in fluid environments, this means that an electrostatic attraction between two groups actually gets stronger in free energy terms the higher the temperature. Thermodynamically, this would imply a positive AS contribution to the attractive free energy between oppositely charged groups. This can be rationalized in terms of the dipole-orientation entropy of molecules in the dielectric medium. Model studies of electrostatic interactions in salts or solutions bear out the complexity of the thermodynamics of such interactions, which may be endothermic or exothermic and entropy-driven or not, as the case may be. The complex electrostatic properties of real proteins have received detailed attention only relatively recently (Gilson and Honig, 1986; Stigter and Dill, 1990; Yang et al., 1993; Antosiewicz et al., 1994), and the breakdown into enthalpy/entropy contributions is still unclear. Denaturant and Osmolytes
There is still considerable discussion regarding the mechanism of protein unfolding by chemical denaturants such as urea, guanidinium chloride, and so forth.
254
ALAN COOPER
Possibly the effect arises from (weak) binding of these molecules to groups on the unfolded protein that would destabilize the folded form in the manner described above for other ligand-binding situations (Makhatadze and Privalov, 1992). Alternatively, it is suggested that the effect is more indirect, resulting from changes in solvent structure or hydration/solvation of the protein, especially at the high concentrations at which these chemical denaturants are effective (Schellman, 1987a,b; Timasheff, 1992). Nevertheless, regardless of the detailed mechanism, denaturation by high concentrations of urea, guanidine chloride, or other highly water-soluble compounds has long been recognized as a useful empirical tool. It is widely used in studies of site-directed mutagenesis effects on protein stability (e.g., Matouschek et al., 1994; Serrano et al., 1992; Fersht et al., 1992), where it has been particularly effective in estimating the small changes (usually) in folding free energy brought about by amino acid replacements or other minor modifications. The procedure is based on extrapolation of free energy and other data obtained over a range of denaturant concentrations. Typically the extents of unfolding at different urea or GuHCl concentrations might be measured by CD, fluorescence, or another technique and converted to a AGunf using a two-state assumption as described earlier. These data correspond, of course, to unfolding free energies at relatively high denaturant concentrations (e.g., 2-8 M) and are not necessarily related to more physiological conditions. Empirically, however, it is found that AGunf varies almost linearly with denaturant concentration and can be extrapolated to zero concentration to give an estimate in the absence of denaturant. This extrapolation is quite long, and concern has been expressed about its validity, but detailed comparisons of this method with more direct calorimetric determinations show remarkably good agreement (Hu et al., 1992; Santoro and Bolen, 1992; Matouschek et al., 1994; Johnson and Fersht, 1995), though the extrapolations are not always linear and care has to be taken to maintain a sufficiently high salt concentration in the case of GuHCl denaturation. Addition of alcohols and other miscible solvents also generally reduces the stability of proteins in water. The thermodynamics of this (Velicelebi and Sturtevant, 1979; Woolfson et al., 1993) are consistent with what might be expected from reduction in hydrophobic interactions resulting from reduced polarity of the solvent environment of the unfolded polypeptide. Detailed analysis, however, is complicated because of the inevitable effect such drastic solvent changes will have on the conformational population of the unfolded chain, which is even less likely to be "random coil" in the presence of organic solvent mixtures. Osmolytes, on the other hand, are a range of water-soluble compounds that, at relatively high concentrations and in contrast to denaturants, stabilize globular proteins against thermal unfolding (Santoro et al., 1992). Such effects are biologically important in organisms subjected to heat, dehydration, or other environmental stress, where a range of naturally-occuring osmolytes including sugars, polyhydric alcohols, amino acids, and methylamines might protect against protein denaturation (Yancey et al., 1982). Glycine-based osmolytes such as sarcosine (8.2 M concen-
Thermodynamics of Protein Folding and Stability
255
tration) give an increase in Tm of up to 23 °C for example, with small globular proteins (Santoro et al., 1992). The mechanism of osmolyte stabilization of folded proteins remains unclear.
"MOLTEN GLOBULES" AND OTHER NONNATIVE STATES The thermodynamic properties of molten globules and other nonnative protein conformations are difficult to establish unambiguously (Privalov, 1996). This is partly because the states themselves are difficult to define, and in only relatively few instances can experimental conditions be found that stabilize significant populations of such species. Also, by their very nature, such states lack the cooperativity characteristic of folding to the compact native conformation. This means that the two-state model is rarely applicable to transitions to or from the molten globule state. Instead, changes in temperature or other experimental variables usually give rise to continuous changes in properties in accordance with a more gradual shift in conformational population. In such situations, only calorimetric methods can give unambiguous thermodynamic data, and even here the data are sparse. DSC experiments on the thermal unfolding of the "acid molten globule state" of apo-myoglobin, for example (Griko and Privalov, 1994; Makhatadze and Privalov, 1995), show only a gradual heat energy uptake and a broad, sigmoidal increase in heat capacity with temperature, with none of the cooperative endothermic heat capacity discontinuity seen for the true native protein at higher pH. Similar results are found with a-lactalbumin (Griko et al., 1994) and other proteins, though comparative discussion is often hampered by lack of agreed definition and characterization of these states. In such a situation, it is fair to ask whether the molten globule is really such a well-defined state. Ptitsyn (1995; and earlier references therein) has argued strongly that it is. However, the lack of any well-defined thermal transition suggests the more general view that we are seeing just variation in a continuum of conformationally heterogeneous states under conditions where the native fold is only marginally stable. Observation of molten globule states typically requires low pH (pH 2-4), lack of co-factor or ligand (e.g., apo-a-lactalbumin lacking bound Ca2+; apo-myoglobin lacking the heme group), and, sometimes the addition of low concentrations of denaturant (alcohols, GuHCl, etc.). Under such conditions the protonation of acidic residues and the lack of stabilizing ligand interactions will tend to destabilize the native fold. Yet, particularly at low temperatures, there will be sufficient residual interactions between residues to support clustering of conformations in more compact states, possibly even resembling the native state in secondary structure content and other properties (Griko et al., 1994). With increase in temperature or harsher pH/denaturant conditions, however, the conformational heterogeneity will gradually expand to more open states, spanning greater regions of conformational space. In such a broad continuum of conformationally heterogeneous states, it is a matter of taste or experimental convenience where one draws the line between "native," "molten globule," "partially folded,"
256
ALAN COOPER
and "unfolded" states. Moreover, different experimental techniques will probe different aspects of these conformational populations and may give conflicting views. See Privalov (1996) for a critical review.
REVERSIBILITY Central to all of the thermodynamic discussion and to most experimental determinations of thermodynamic parameters for folding transitions is the assumption that the process under investigation is reversible—that is, on the time scale of the experiment, that the system is in equilibrium and the concentrations of all molecular species present are determined by thermodynamics and not kinetics. This is frequently not the case and can be a particular problem with experiments involving thermal unfolding (DSC for example) where exposure of the unfolded polypeptide to relatively high temperatures can bring about a variety of physical and chemical changes that affect the reversibility of the folding and can prejudice the results unless carefully controlled. Chemical changes such as proline isomerization, disulphide interchange, oxidation, and spontaneous deamidation of asn and gin residues, for example, are all possible and will alter the folding properties of the polypeptide. Aggregation or precipitation of the unfolded polypeptide is also common at high temperatures or in certain solvent mixtures. In calorimetric experiments, such irreversible processes can be recognized by their effects on the thermogram. Figure 8, for example, shows a series of repeated T
20
40
»
1
•
60
r
80
100
Temperature (°C) Figure 8. Repeat DSC scans of thermal unfolding of lysozyme (3.12 mg/ml, 0.1 M glycine/HCI, pH 3.4) showing possible accumulation of misfolded forms. Scan rate was 60 °C hr~ with 60 min. cooling between scans.
Thermodynamics of Protein Folding and Stability
DSC traces for the thermal unfolding of lysozyme where the sample is simply cooled back to room temperature after each scan. Although the major, native transition at about 74 °C is apparent throughout, each successive heating/cooling cycle sees the appearance of two (or more) transitions at lower temperatures together with a decrease in magnitude of the main transition. These less stable species are probably misfolded, or incorrectly folded forms of the polypeptide brought about by the build-up of chemical changes (proline isomerization, side chain deamidation) with repeated unfolding and exposure to high temperature (Cooper and Nutley, unpublished). Although proline isomerization is reversible, in principle (Stein, 1993; Schmid et al., 1993), it is likely to be slow on the timescale of these experiments such that, on cooling, the polypeptide gets trapped with the wrong proline conformers. Lysozyme has two proline residues in its amino acid sequence, so four different cis/tmns combinations are possible in principle— though both are trans in the native conformation. It is interesting, but by no means yet conclusive, to note the appearance of four possible misfolded species in the DSC experiment (Figure 8). Disulphide effects are not likely here since the process appears unaffected by the presence of reducing agents (DTT). This contrasts with another example where we have shown that a time-dependent irreversible effect on the folding of the methionine repressor protein, Metl, can be totally eliminated by addition of DTT to the sample buffer (Johnson et al., 1992). Figure 9 shows a series of repeat DSC scans of MeJ giving a progressive decrease in magnitude with each heat/cool cycle. No misfolded species are apparent here, nor is there any evidence of thermal aggregation of the protein, but the effect depends on the amount of time the polypeptide is kept in the unfolded state at high temperatures and appears to be related to disulphide exchange since it can be suppressed by addition of DTT. In the absence of reducing agents, the kinetics of loss of refolding capacity are roughly first-order in time above the unfolding temperature (Figure 9). In the case of MeJ, explanation of this effect is relatively straightforward. MetJ is a dimeric protein, and each monomer contains one buried cysteine (-SH) residue whose function is (as yet) unknown, but which remains reduced in the native dimer structure. Upon unfolding and under oxidizing conditions the formation of intermolecular S-S crosslinks between these cysteines is likely, giving nonnative crosslinked dimers that are unable to fold correctly. (It is tempting to speculate that such nonnative, crosslinked dimers might actually be transient intermediates in the protein-folding pathway of this dimeric protein in the reducing conditions found within the cell because this would facilitate correct juxtaposition of the monomers prior to folding, but this hypothesis has yet to be tested.) For the lysozyme and MetJ examples quoted above, the irreversible processes are usually too slow to have any serious effect on the DSC measurements, or they can be eliminated by addition of appropriate reducing agent. Frequently, however, this is not the case, and serious distortion of DSC thermograms results from (usually exothermic) irreversible processes occurring simultaneously with thermal unfold-
257
ALAN COOPER
258 I
1
I
'
A\
(A)
-
//2\\
40
3
fs
20
60
(C)
-j A
1
#
1
i
1
i
(
J
,
,
,
i
*
i
J
r 30
J
40
Temperature (°C)
ImMDTT •
%
m
H 100 A
A
' •• •
•j
1 JP
1
•
km
0.2mM DTT
\
•\
A H io
h NoDTT
20
-i
1—j
1
|
i .
i 1
40
1 • 1 ^-L
80
\
J
vl
0
100
r
i i i
60
-
'
1 1 • 1
80
-
I.X.. 1
100
\
]
L
H 1 60
i
1
i
i
50
i in
i
i
75If 100150200250
Incubation time (min)
Figure 9. Effect of reducing agent on the reversibility of thermal unfolding of the methionine repressor protein (MetJ). (A) Repeat DSC scans of MetJ in the absence of reducing agent. (B) Repeat DSC scans of MetJ in the presence oi 1 m M DTT (dithiothreitol). (C) Effect of DTT on the degree of reversibility of the MetJ thermal unfolding transition following different incubation periods above 45 °C (for details, see Cooper e t a l v 1992).
ing. Thermal aggregation (precipitation) of unfolded protein is a particular problem. This is illustrated (for PGK) in Figure 10, where the shape of the thermogram is severely distorted by exothermic aggregation of the unfolded polypeptide, and the noisy posttransition baseline is a consequence of erratic convection effects of precipitated protein within the calorimeter cell. Such aggregation is rarely reversible, and rescan of such samples after cooling show no discernible transition. Even when no irreversible effects are immediately apparent from the shape of the DSC thermogram, a dependence on DSC scan rate can often indicate problems. Several groups (Sanchez-Ruiz etal., 1988;Galisteoetal., 1991;Lepocketal., 1992) have done detailed analysis of such situations and have developed theoretical procedures that allow such experiments to give both thermodynamic and kinetic information. Irreversibility (or nonreversibility) is also apparent in many noncalorimetric experiments where it can be monitored by lack of total regain of enzyme activity, for example, or simple appearance of protein precipitate (see discussion by Mitraki
259
Thermodynamics of Protein Folding and Stability •
I
20
.
I
.
40
|
i
L
60
Temperature (°C) Figure 10. DSC data for thermal denaturation of yeast phosphoglycerate kinase (PGK; 50 m M Pipes, pH 7.0) illustrating exothermic baseline distortion and noise effects caused by irreversible precipitation of unfolded protein.
et al., 1987, for example). The possible distortion that such effects may produce on equilibrium denaturation curves has been less systematically explored, as yet.
EFFECTS OF CROSSLINKING The presence of irreversible crosslinks in the form of -S-S- bridges between cysteine residues or other covalent links connecting regions of polypeptide enhances the relative stability of the folded protein, and the introduction of such crosslinks is a very effective way of improving stability. The effect is primarily entropic, arising from the topological constraints leading to a reduction in the number of configurations available to the unfolded chain (Schellman, 1955; Flory, 1956; Poland and Scheraga, 1965; Pace et al., 1988). In the absence of crosslinks, the distance between any two groups in the unfolded protein varies, and the probability distribution is determined by the statistics of the polymer chain and a range dependent only on the length of the chain. A crosslink between two distant groups in the polymer forms a loop with a much restricted set of possible chain configurations, the statistical distribution is restricted to only those conformers that give an end-to-end chain distance consistent with the juxtaposition of groups enforced by the crosslinks. For any one loop formed by crosslinking between groups n residues apart in the chain sequence, using classical theories of polymer chain statistics (Jacobson and
ALAN COOPER
260
Stockmayer, 1950; Schellman, 1955; Flory, 1956), the reduction in conformational entropy (ASconf) of the unfolded chain estimated by considering the relative probability that the ends of a polymer chain will be found within the same volume element (vs) is given by: AS
conf = " R ' ln(3/(27il2n)3/2)vs
(47)
where 1 is the length of a statistical segment of the chain, usually taken to be 3.8 A for a polypeptide. (This is for a single loop. The more complex situation of multiple, topologically dependent loops has been considered by Poland and Scheraga, 1965). Various estimates of vs have been used. For a disulphide crossbridge, taking the distance of closest approach of the -SH groups as about 4.8 A (Thornton, 1981), Pace et al. (1988) used vs = 57.9 A3 (corresponding to a sphere of diameter 4.8 A) giving ASconf = -8.8 - (3/2)R • ln(n) J K"1 mol"1
(48)
which gave reasonable agreement with experiment for the decrease in folding free energy (8AG = TASconf) of a series of proteins upon removal of specific disulphide bridges. Such agreement may be fortuitous, however, since there are various assumptions and approximations inherent in the above. In particular, it is assumed that the unfolded polypeptide behaves as a statistical random coil, with a Gaussian end-toend chain probability distribution in the absence of crosslinks. This may be reasonable for relatively large loops in a good denaturing solvent mixture, but it will probably overestimate the effect under more realistic situations with most proteins, where the experimentally accessible unfolded state probably still contains residual conformation and is less than random coil. Furthermore, these estimates assume that the crosslink effect lies simply in the configurational entropy of the unfolded chain, and that the presence of the crosslink in the folded protein introduces no conformational strain or other constraints in the native form. Doig and Williams (1991) have also argued that the presence of disulphide crosslinks in the unfolded polypeptide leads to strain and other additional effects in the unfolded protein that override the entropic effects, though earlier work appears to rule this out (Johnson et al., 1978). These various possibilities have been explored by more detailed thermodynamic measurements of specifically disulphide-modified proteins (Cooper et al., 1992; Kuroki et al., 1992) with somewhat divergent conclusions, though care must be exercised to ensure that the experimental modifications used do not introduce additional destabilizing effects into the folded protein in the form of bulky or charged substituents. DSC comparison of the thermal unfolding of native (4-disulphide) and a specific 3-disulphide hen egg white lysozyme is illustrated in Figure 11 (Cooper et al., 1992). Removal of the Cys6-Cysl27 crossbridge results in a reduction in Tm of 25
Thermodynamics of Protein Folding and Stability T
1
-J
40
1
i
261
,
1
I
•
60
1
I
80
r
i
I
100
Temperature (°C) Figure 11. DSC comparison of thermal unfolding of native (4-disulphide) and C M 6 ' 1 2 7 (3-disulphide) lysozyme at pH 3.8, 50mM glycine/HCI buffer.
to 30 °C under the same conditions together with a reduction in AHm. However, because of the inherent variation of AHm with temperature (AC effect), it is not possible from one such experiment alone to determine the source of destabilization. Comparison of AHm for these proteins over a range of temperatures (by conducting experiments over a range of pH) shows that, within experimental uncertainty, the enthalpies of unfolding of these two proteins fall on the same line and that, for unfolding at the same temperature, the enthalpies are the same. Consequently, any difference in folding stability must arise solely from entropy differences. ASunf for each of the two proteins (Figure 12) differ by about 90 J K"1 mol -1 over the pH range studied, in reasonable agreement with theoretical estimates for a 122-residue loop (Pace et al., 1988). The disulfide modification used here and the location of this particular crossbridge in the native structure is such that minimal perturbation of the folded protein is expected here, and this is confirmed by NMR studies (Radford et al., 1991). Qualitatively similar results have been obtained in a recent comparison of the thermal unfolding of native bovine a-lactalbumin and a modified form lacking the equivalent 6-120 disulfide bond (Robertson, Creighton, and Cooper, unpublished). Here, however, although the enthalpies of unfolding of the two forms of the protein are similar when compared at the same temperature, the destabilizing effect and the
ALAN COOPER
262
PH
Figure 12. Variation with pH of the entropy (ASunf, upper panel) and free energy (AGunf/ lower panel) of unfolding at 25 °C of native lysozyme and its 3-disulphide form. The lines in the lower panel show the free energy behavior expected for an uptake of 3 H + ions per molecule during unfolding.
entropy difference is somewhat less than would be anticipated for a loop of this size using the theory above. There are various possibilities for this discrepancy. Firstly, "unfolded" a-lactalbumin is known to exist in a range of different conformational subclasses (including "molten globule") depending on conditions, and it is unlikely to behave as a fully random coil upon thermal unfolding. The system is yet more complicated by the Ca2+ binding of this protein, and Ca2+ or other cation binding to the unfolded polypeptide might produce transient noncovalent crossbridges and further restrict the conformational freedom of the chain. Moreover, tryptophan fluorescence-quenching experiments (unpublished) of the folded protein indicate that removal of this disulfide link increases the accessibility of some trp residues to small molecule quenchers, thus indicating that the conformation or conformational dynamics of the native form seem also to be affected by removal of this crossbridge. No NMR or crystallographic data are yet available to check this more thoroughly.
Thermodynamics of Protein Folding and Stability
By contrast, studies by Kuroki and colleagues (1992) of mutant human lysozymes lacking the disulfide bridge between cysteine residues 77 and 95 indicate that the observed destabilization in this case is enthalpic with a paradoxically smaller unfolding entropy for the mutants lacking this crosslink. The difference here may be because the Cys77-95 crosslink involves a relatively tight loop and is buried within the protein structure rather than close to the surface as in the previous examples. Consequently, removal of this crossbridge is likely to have significantly greater effect on the native structure and dynamics. Kuroki and colleagues (1992) indeed showed that removal of this link did increase the flexibility of the native state thereby increasing the entropy of the folded form of the protein. More recent studies on another protein (Vogl et al., 1995) confirm this general trend that relative contributions to folding stability of enthalpic and entropic terms depends on loop length and positioning of the crossbridge. Destabilization involving large loops tends to be purely entropic, as expected from the classic picture, but enthalpy effects play a greater role for shorter loops.
FIBROUS PROTEINS Relatively little systematic work has been done on the thermodynamics of folding of fibrous or other non-globular proteins. Experimentally, such proteins are frequently more difficult to work with. They are often poorly soluble and difficult to purify to homogeneity in sufficient quantities for biophysical studies. They generally have a high molecular weight and are made up of several long polypeptide chains that make them prone to aggregation and entanglement when unfolded. The unfolding transitions are therefore often irreversible on the experimental timescale, and noncooperative or non-two-state processes that makes thermodynamic analysis difficult. In addition to this, relatively little is usually known about their structure, even in the folded state, since they are less amenable to high resolution crystallographic studies. Consequently, theoretical analysis of their folding interaction is less secure. Amino acid side-chains in such proteins may frequently remain exposed to solvent on the outside of the elongated chain structure even in the folded state—so factors such as burying of hydrophobic groups should be of less significance. Early work on the collagen family of proteins, based on variations in experimental Tm values for a range of naturally occurring tropocollagens with varying proline and hydroxyproline contents, showed indirectly that backbone hydrogen bonding between polypeptide chains in this triple-stranded structure is unlikely to be the dominant stabilizing force (Cooper, 1971, and references therein). These proteins are unusual in containing large numbers of proline and hydroxyproline residues at regular positions in their primary structures, and the number of available interchain peptide H-bonding groups will decrease with increasing amino acid content. Paradoxically the estimated heat of unfolding (AH^) increases with increasing pro + hypro content, that is, unfolding of the collagen triple helix becomes more
263
ALAN COOPER
264
endothermic the fewer the number of inter-chain hydrogen bonds. The increased thermal stability of collagens with higher pro + hypro content comes mainly from the decrease in rotational degrees of freedom of the unfolded chains because of the restrictions in backbone rotations imposed by the pyrrolidine ring structure of the proline or hydroxyproline sidechain. This reduces the conformational entropy of the unfolded chain and therefore indirectly stabilizes the folded structure. The additional enthalpic contributions seem to come from regular solvation effects, possibly involving extended hydrogen-bonded chains of water molecules acting as a sort of "aqueous scaffolding" at the surface of the triple helix. Such interactions are impossible to model or mimic in small molecule systems and are therefore difficult to characterize thermodynamically. More recent calorimetric and other work (reviewed in Privalov, 1982) has confirmed the anomalous enthalpy behavior of collagen unfolding and the intimate role of water. Work on other fibrous proteins is less extensive, with the possible exception of the myosin/tropomyosin family of a-helical coiled-coil proteins (Privalov, 1982). Thermal unfolding of these proteins is a highly noncooperative process involving several overlapping transitions over an extended range of temperatures. This probably represents the unfolding of various independent or semiindependent domains in these large proteins and makes thermodynamic analysis difficult.
MEMBRANE PROTEINS We expect that the factors governing thermodynamic stability of membrane proteins should, in principle, differ significantly from those for water-soluble proteins. In some ways they might be simpler. Unfolding of a protein totally within the nonpolar lipid bilayer would involve none of the complications of aqueous solvation or hydrophobic interactions and would be dominated presumably by breaking of H-bonds and other polar interactions in the folded protein. Unfortunately this neglects the two-phase nature of the system in which membrane proteins frequently have loops of polypeptide exposed to the aqueous phase and where the extent of exposure may well change during folding/unfolding reactions. Experimental data are sparse because of the intrinsic technical difficulties associated with measurements on membrane proteins, and the lack of comprehensive structural data on such systems makes interpretation difficult. Some calorimetric data on unfolding of rhodopsin and bacteriorhodopsin have been obtained (Miljanich et al., 1985; Kahn et al., 1992), including the role of retinal binding and loop regions. Interestingly, it appears that, at least in this case, ligand binding and interhelical loops are less significant for protein stability than the side-by-side interactions between helices within the membrane. The precise, however, nature of these side-by-side interactions has not yet been established.
Thermodynamics of Protein Folding and Stability
265
FINALE Why proteins fold is still a bit of a mystery. That is, the opposing thermodynamic forces are so delicately balanced that it is difficult to decide which, if any, are predominant—and indeed the balance may be different in different proteins. Nevertheless, the more we get into this intriguing problem, the more we learn about the nature of biomolecular interactions and how they have been fine tuned during evolution to meet biological needs. Chris Anfmsen himself was often pessimistic about the protein folding problem, expressing it this way: if there are N proteins in the entire world, then by the time we have solved the structure of (N-l) of them perhaps (and only perhaps!) might we accurately predict the structure of the Nth. We still have some way to go.
NOTES 1. Around 1971, I shared a rather dilapidated and now demolished office with Chris Anfinsen in South Parks Road, Oxford, during his sabbatical visit to the Molecular Biophysics Laboratory shortly before he won the Nobel Prize. Chris was a visiting fellow of All Souls College (or "Old Souls" as he usually liked to call it), and I was a still-wet-behind-the-ears postdoc. Memories of his charm, intellect, friendliness, and scientific humility have been a guiding influence ever since. 2. For a 100-residue protein, even allowing just three possible <{)-\j/ angles per peptide group would give rise to 3 = 5 x 1 0 possible different conformations of the polypeptide chain. Such unimaginably large numbers gave rise to the "Levinthal paradox" (Levinthal, 1968; Dill, 1993) whereby there is insufficient time, even in the known lifetime of the universe, for any polypeptide to explore all these possibilities to find the "right" one. 3. For technical reasons, the superscript zeros in AG° and AS° are important —they designate changes occurring under standard state conditions. In the simple A =F± B isomerization example here, only the concentration ratios matter, not their absolute values. But more general cases where the number of molecules can change during reaction, we must correct for entropy of mixing contributions or relate everything to defined standard states. In contrast, the variation in enthalpy with concentration is normally insignificant, and it is usually permissible to use AH and AH° interchangeably. See any standard thermodynamics text for details.
REFERENCES Anfinsen, C.B. (1973). Principles that govern the folding of protein chains. Science 181, 223-230. Anfinsen, C.B. and Scheraga, H.A. (1975). Experimental and theoretical aspects of protein folding. Adv. Protein Chem. 29, 205-300. Antosiewicz, J., McCammon, J.A., and Gilson, M.K. (1994). Prediction of pH-dependent properties of proteins. J. Mol. Biol. 238, 415-436. Baldwin, R.L. (1986). Temperature dependence of the hydrophobic interaction in protein folding. Proc. Natl. Acad. Sci. USA 83, 8069-8072. Brandts, J.F. (1964). The thermodynamics of protein denaturation. I. The denaturation of chymotrypsinogen. J. Am. Chem. Soc. 86,4291-4301. Brandts, J.F. and Lin, L.-N. (1990). Study of strong to ultratight protein interactions using differential scanning calorimetry. Biochemistry 29, 6927-6940. Brooks, C.L., Karplus, M. and Pettitt, B.M. (Eds.) (1988). Proteins: A theoretical perspective of dynamics, structure, and thermodynamics. Wiley Interscience, New York.
266
ALAN COOPER
Chan, H.S., Bromberg, S. and Dill, K.A. (1995). Models of cooperativity in protein folding. Phil. Trans. R.Soc.Lond.B 348, 61-70. Cooper, A. (1971). Thermal stability of tropocollagens—Are hydrogen bonds really important? J. Mol. Biol. 55, 123-127. Cooper, A. (1976). Thermodynamicfluctuationsin protein molecules. Proc. Natl. Acad. Sci. USA, 73, 2740-2741. Cooper, A. (1984). Proteinfluctuationsand the thermodynamic uncertainty principle. Prog. Biophys. Molec. Biol. 44, 181-214. Cooper, A. (1992). Effect of cyclodextrins on the thermal stability of globular proteins. J. Am. Chem. Soc. 114,9208-9209. Cooper, A., Eyles, S.J., Radford, S.E., and Dobson, CM. (1992). Thermodynamic consequences of the removal of a disulphide bridge from hen lysozyme. J. Mol. Biol. 225, 939-943. Cooper, A. and McAuley-Hecht, K.E. (1993). Microcalorimetry and the molecular recognition of peptides and proteins. Phil.Trans.R.Soc. Lond. A 345, 23-35. Creighton, T.E. (1988). Disulphide bonds and protein stability. BioEssays 8, 57-63. Creighton, T.E. (1991). Stability of folded conformations. Curr. Opin. Struct. Biol. 1, 5-16. Creighton, T.E. (Ed.)(1992). Protein Folding. W.H. Freeman and Co., New York. Dao-pin, S., Anderson, D.E., Baase, W.A., Dahlquist, F.W., and Matthews, B.W. (1991). Structural and thermodynamic consequences of burying a charged residue within the hydrophobic core of T4 lysozyme. Biochemistry 30, 11521-11529. Dill, K.A. (1990a). Dominant forces in protein folding. Biochemistry 29, 7133-7155. Dill, K.A. (1990b). The meaning of hydrophobicity. Science 250, 297. Dill, K.A. (1993). Folding proteins: Finding a needle in a haystack. Curr. Opin. Struct. Biol. 3,99-103. Dill, K.A., Bromberg, S., Yue, K., Fiebig, K.M., Yee, D.P., Thomas, P.D., and Chan, H.S. (1995). Principles of protein folding—A perspective from simple exact models. Protein Sci. 4, 561-602. Dill, K.A. and Stigter, D. (1995). Modeling protein stability as heteropolymer collapse. Adv. Protein Chem. 46, 59-104. Doig, A.J. and Sternberg, M.J.E. (1995). Side-chain conformational entropy in protein folding. Protein Sci. 4, 2247-2251. Doig, A.J. and Williams, D.H. (1991). Is the hydrophobic effect stabilizing or destabilizing in proteins? The contribution of disulphide bonds to protein stability. J. Mol. Biol. 217, 389-398. Dunitz, J.D. (1995). Win some, lose some: Enthalpy-entropy compensation in weak intermolecular interactions. Chem. Biol. 2, 709-712. Edsall, J.T. (1995). Hsien Wu and the first theory of protein denaturation (1931). Adv. Protein Chem. 46, 1-5. Eisenberg, D. and Kauzmann, W. (Eds.)(1969). The Structure and Properties of Water. Oxford University Press, London. Fersht, A.R., Matouschek, A., and Serrano, L. (1992). The folding of an enzyme. 1. Theory of protein engineering analysis of stability and pathway of protein folding. J. Mol. Biol. 224, 771-782. Flory, P.J. (1956). Theory of elastic mechanisms in fibrous proteins. J. Am. Chem. Soc. 78, 5222-5235. Franks, F. (1995). Protein destabilization at low temperatures. Adv. Protein Chem. 46, 105-139. Fukada, H., Sturtevant, J.M., and Quiocho, FA. (1983). Thermodynamics of the binding of L-arabinose and of D-galactose to the L-arabinose-binding protein of Escherichia coli. J. Biol. Chem. 258, 13193-13198. Galisteo, M.L., Mateo, PL., and Sanchez-Ruiz, J.M. (1991). Kinetic study on the irreversible thermal denaturation of yeast phosphoglycerate kinase. Biochemistry 30, 2061-2066. Gilson, M.K. and Honig, B. (1986). The dielectric constant of a folded protein. Biopolymers 25, 2097-2119. Griko, Y.V. and Privalov, PL. (1994). Thermodynamic puzzle of apomyoglobin unfolding. J. Mol. Biol. 235, 1318-1325.
Thermodynamics of Protein Folding and Stability
267
Griko, Y.V., Freire, E., and Privalov, RL. (1994). Energetics of the a-lactalbumin states: A calorimetric and statistical thermodynamic study. Biochemistry 33, 1889-1899. Grunwald, E. and Steel, C (1995). Solvent reorganization and thermodynamic enthalpy-entropy compensation. J. Am. Chem. Soc. 117, 5687-5692. Heringa, J., Frishman, D., and Argos, P. (1997). Computational methods relating peptide sequence and structure in Protein: A Comprehensive Treatise. (Allen, G., Ed.), pp. 165- 268. JAI Press, Greenwich, CT. Honig, B., Sharp, K.A., and Yang, A.-S. (1993). Macroscopic models of aqueous solutions: Biological and chemical applications. J. Phys. Chem. 97, 1101-1109. Honig, B. and Yang, A.-S. (1995). Free energy balance in protein folding. Adv. Protein Chem. 46,27-58. Hu, C-Q., Sturtevant, J.M., Thomson, J.A., Erickson, R.E., and Pace, C.N. (1992). Thermodynamics of ribonuclease Tl denaturation. Biochemistry 31, 4876-4882. Jackson, W.M. and Brandts, J.F. (1970). Thermodynamics of protein denaturation. Calorimetric study of the reversible denaturation of chymotrypsinogen and conclusions regarding the accuracy of the two-state approximation. Biochemistry, 9, 2294-2301. Jacobson, H. and Stockmayer, W.H. (1950). Intramolecular reaction in polycondensations. I. The theory of linear systems. J. Chem. Phys. 18, 1600-1606. Johnson, CM. and Fersht, A.R. (1995). Protein stability as a function of denaturant concentration: The thermal stability of barnase in the presence of urea. Biochemistry 34, 6795-6804. Johnson, R.E., Adams, P., and Rupley, J. A. (1978). Thermodynamics of protein crosslinks. Biochemistry 17, 1479-1484. Johnson, CM., Cooper, A., and Stockley, P.G. (1992). Differential scanning calorimetry of thermal unfolding of the methionine repressor protein (MetJ) from Escherichia coli. Biochemistry 31, 9717-9724. Kahn, T.W., Sturtevant, J.M., and Engelman, D.M. (1992). Thermodynamic measurements of the contributions of helix-connecting loops and of retinal to the stability of bacteriorhodopsin. Biochemistry 31, 8829-8839. Kauzmann, W. (1959). Some factors in the interpretation of protein denaturation. Adv. Protein Chem. 14, 1-63. Khechinashvili, N.N., Janin, J., and Rodier, F. (1995). Thermodynamics of the temperature-induced unfolding of globular proteins. Protein Sci. 4, 1315-1324. Klotz, I.M. and Farnham, S.B. (1968). Stability of an amide-hydrogen bond in an apolar environment. Biochemistry 7, 3879-3882. Kresheck, G.C. and Klotz, I.M. (1969). The thermodynamics of transfer of amides from an apolar to an aqueous solution. Biochemistry 8, 8-12. Kuroki, R., Inaka, K., Taniyama, Y, Kidokoro, S., Matsushima, M., Kikuchi, M., and Yutani, K. (1992). Enthalpic destabilization of a mutant human lysozyme lacking a disulfide bridge between cysteine-77 and cysteine-95. Biochemistry 31, 8323-8328. Lazaridis, T., Archontis, G., and Karplus, M. (1995). Enthalpic contributions to protein stability: Insights from atom-based calculations and statistical mechanics. Adv. Protein Chem. 47, 231-306. Lee, B. (1991). Isoenthalpic and isoentropic temperatures and the thermodynamics of protein denaturation. Proc. Natl. Acad. Sci. USA, 88, 5154-5158. Lepock, J.R., Ritchie, K.P., Kolios, M.C, Rodahl, A.M., Heinz, K.A., and Kruuv, J. (1992). Influence of transition rates and scan rate on kinetic simulations of differential scanning calorimetry profiles of reversible and irreversible protein denaturation. Biochemistry 31, 12706-12712. Levinthal, C (1968). Are there pathways for protein folding? J. Chim. Phys. 65, 44-45. Lumry, R. and Rajender, S. (1970). Enthalpy-entropy compensation phenomena in water solutions of proteins and small molecules: A ubiquitous property of water. Biopolymers 9, 1125-1227. Makhatadze, G.I. and Privalov, PL. (1992). Protein interactions with urea and guanidinium chloride: A calorimetric study. J. Mol. Biol. 226,491-505.
268
ALAN COOPER
Makhatadze, G.I. and Privalov, PL. (1993). Contribution of hydration to protein folding thermodynamics. I. The enthalpy of hydration. J. Mol. Biol. 232, 639-659. Makhatadze, G.I. and Privalov, PL. (1995). Energetics of protein structure. Adv. Protein Chem. 47, 307-425. Makhatadze, G.I. and Privalov, PL. (1996). On the entropy of protein folding. Protein Sci. 5, 507-510. Matouschek, A., Matthews, J.M., Johnson, CM., and Fersht, A.R. (1994). Extrapolation to water of kinetic and equilibrium data for the unfolding of barnase in urea solutions. Protein Eng. 7, 1089-1095. Miljanich, G., Brown, M., Mabrey-Gaud, S., Dratz, E., and Sturtevant, J.M. (1985). Thermotropic behavior of retinal rod membianes and dispersions of extracted phospholipids. J. Membr. Biol. 85, 79-86. Mitraki, A., Betton, J.M., Desmadril, M, and Yon, J.M. (1987). Quasi-irreversibility in the unfoldingrefolding transition of phosphoglycerate kinase induced by guanidine hydrochloride. Eur. J. Biochem. 163,29-34. Murphy, K.P and Freire, E. (1992). Thermodynamics of structural stability and cooperative folding behavior in proteins. Adv.Protein Chem. 43, 313-361. Murphy, K.P, Privalov, PL. and Gill, S.J. (1990). Common features of protein unfolding and dissolution of hydrophobic compounds. Science 247, 559-561. Naghibi, H., Tamura, A, and Sturtevant, J.M. (1995). Significant discrepancies between van't Hoff and calorimetric enthalpies. Proc. Natl. Acad. Sci. USA 92, 5597-5599. Pace, C.N., Grimsley, G.R., Thomson, J.A., and Bamett, B.J. (1988). Conformational stability and activity of ribonulease Tl with zero, one, and two intact disulfide bonds. J.Biol.Chem. 263, 11820-11825. Pfeil, W. and Privalov, PL. (1976a). Thermodynamic investigations of proteins. I. Standard functions for proteins with lysozyme as an example. Biophys. Chem. 4, 23-32. Pfeil, W. and Privalov, PL. (1976b). Thermodynamic investigations of proteins. II. Calorimetric study of lysozyme denaturation by guanidine hydrochloride. Biophys. Chem. 4, 33-40. Pfeil, W. and Privalov, PL. (1976c). Thermodynamic investigations of proteins. III. Thermodynamic description of lysozyme. Biophys. Chem. 4, 41-50. Poland, DC. and Scheraga, H. A. (1965). Statistical mechanics of noncovalent bonds in poly amino acids. VIII. Covalent loops in proteins. Biopolymers 3, 379-399. Privalov, PL. (1979). Stability of proteins: Small globular proteins. Adv. Protein Chem. 33, 167-241. Privalov, PL. (1982). Stability of proteins: Proteins which do not present a single cooperative system. Adv. Protein Chem. 35, 1-104. Privalov, PL. (1990). Cold denaturation of proteins. Crit. Rev. Biochem. Mol. Biol. 25, 281-305. Privalov, PL. (1992). Physical basis of the stability of the folded conformations of proteins. In: Protein Folding. W.H. Freeman and Co., New York. Privalov, PL. (1996). Intermediate states in protein folding. J. Mol. Biol. 258, 707-725. Privalov, PL. and Gill, S.J. (1988). Stability of protein structure and hydrophobic interactions. Adv. Protein Chem. 39, 191-234. Privalov, PL. and Khechinashvili, N.N. (1974). A thermodynamic approach to the problem of stabilization of globular protein structure: A calorimetric study. J. Mol. Biol. 86, 665-684. Privalov, PL. and Makhatadze, G.I. (1990). Heat capacity of proteins. II. Partial molar heat capacity of the unfolded polypeptide chain of proteins. J. Mol. Biol. 213, 385-391. Privalov, PL. and Makhatadze, G.I. (1993). Contribution of hydration to protein folding thermodynamics. II. The entropy and Gibbs energy of hydration. J. Mol. Biol. 232, 660-679. Privalov, PL. and Potekhin, S.A. (1986) Scanning calorimetry in studying temperature-induced changes in proteins. Methods Enzymol. 131, 4-51. Privalov, PL., Gill, S.J., and Murphy, K.P. (1990). The meaning of hydrophobicity (response). Science 250, 297-298. Ptitsyn, O.B. (1995). Molten globule and protein folding. Adv. Protein Chem. 47, 83-229.
Thermodynamics of Protein Folding and Stability
269
Radford, S.E., Woolfson, D.N., Martin, S.R., Lowe, G., and Dobson, CM. (1991). A three-disulphide derivative of hen lysozyme: structure, dynamics, and stability. Biochern. J. 273, 211-217. Rose, G.D. and Wolfenden, R. (1993). Hydrogen-bonding, hydrophobicity, packing and protein folding. Annu. Rev. Biophys. Biomol. Struct. 22, 381-415. Sanchez-Ruiz, J.M., Lopez-Lacomba, J.L., Cortijo, M. and Mateo, RL. (1988). Differential scanning calorimetry of the irreversible thermal denaturation of thermolysin. Biochemistry 27,1648-1652. Santoro, M.M. and Bolen, D.W. (1992). A test of the linear extrapolation of unfolding free energy changes over an extended denaturant concentration range. Biochemistry 31, 4901-4907. Santoro, M.M., Liu, Y, Khan, S.M.A., Hou, L-X., and Bolen, D.W. (1992). Increased thermal stability of proteins in the presence of naturally occurring osmolytes. Biochemistry 31, 5278-5283. Schellman, J. A. (1955). The stability of hydrogen-bonded peptide structures in aqueous solution. C.R. Trav. Lab. Carlsberg Ser. Chim. 29, 230-259. Schellman, J.A. (1987a). The thermodynamic stability of proteins. Annu. Rev. Biophys. Chem. 16, 115-137. Schellman, J.A. (1987b). Selective binding and solvent denaturation. Biopolymers 26, 549-559. Schmid, F.X., Mayr, L.M., Mucke, M., and Schonbrunner, E.R. (1993). Prolyl isomerases: role in protein folding. Adv.Protein Chem. 44, 25-66. Serrano, L., Kellis, J.T., Cann, R, Matouschek, A., and Fersht, A.R. (1992). The folding of an enzyme. 2. Substructure of barnase and the contribution of different interactions to protein stability. J. Mol. Biol. 224, 783-804. Spolar, R.S., Livingstone, J.R, and Record, M.T. (1992). Use of liquid hydrocarbon and amide transfer data to estimate contributions to thermodynamic functions of protein folding from the removal of nonpolar and polar surface from water. Biochemistry 31, 3947-3955. Stein, R.L. (1993). Mechanism of enzymatic and nonenzymatic prolyl cis-trans isomerization. Adv. Protein Chem. 44, 1-24. Stigter, D. and Dill, K.A. (1990). Charge effects on folded and unfolded proteins. Biochemistry, 29, 1262-1271. Sturtevant, J.M. (1974). Some applications of calorimetry in biochemistry and biology. Ann. Rev. Biophys. Bioeng. 3, 35-51. Sturtevant, J.M. (1987). Biochemical applications of differential scanning calorimetry. Ann.Rev.Phys.Chem. 38, 463-488. Tanford, C. (1968). Protein denaturation. Adv. Protein Chem. 23, 121-275. Tanford, C. (1970). Protein denaturation. Adv. Protein Chem. 24, 1-95. Tanford, C. (Eds.)(1980). The Hydrophobic Effect: Formation of Micelles and Biological Membranes. Wiley Interscience, New York. Thornton, J.M. (1981). Disulfide bridges in globular proteins. J. Mol. Biol. 151, 261-287. Timasheff, S.N. (1992). Water as ligand: preferential binding and exclusion of denaturants in protein unfolding. Biochemistry 31, 9858-9864. Velicelebi, G. and Sturtevant, J.M. (1979). Thermodynamics of the denaturation of lysozyme in alcohol-water mixtures. Biochemistry 18, 1180-1186. Vogl, T, Brengelmann, R., Hinz, H-J., Scharf, M., Lotzbeyer, M., and Engels, J.W. (1995). Mechanism of protein stabilization by disulfide bridges: Calorimetric unfolding studies on disulfide-deficient mutants of the cc-amylase inhibitor Tendamistat. J. Mol. Biol. 254, 481-496. Weber, G. (1975). Energetics of ligand binding to proteins. Adv. Protein Chem. 29, 1-83. Weber, G. (1993). Thermodynamics of the association and the pressure dissociation of oligomeric proteins. J. Phys. Chem. 97, 7108-7115. Weber, G. (1995). van't Hoff revisited: Enthalpy of association of protein subunits. J. Phys. Chem. 99, 1052-1059. Woolfson, D.N., Cooper, A., Harding, M.M., Williams, D.H., and Evans, PA. (1993). Protein folding in the absence of the solvent ordering contribution to the hydrophobic interaction. J. Mol. Biol. 229,502-511.
270
ALAN COOPER
Wu, H. (1931). Studies on denaturation of proteins XIII. A theory of denaturation. Chinese J. Physiol. 5, 321-344. (Reprinted in Adv. Protein Chem. (1995). 46, 6-26.) Wyman, J. (1964). Linked functions and reciprocal effects in hemoglobin: A second look. Adv. Protein Chem. 19, 223-286. Wyman, J. and Gill, S.J. (1990). Binding and Linkage: Functional chemistry of biological macromolecules. University Science Books, Mid Valley, CA. Yancey, PH., Clark, M.E., Hand, S.C., Bowlus, R.D., and Somero, G.N. (1982). Living with waterstress: Evolution of osmolyte systems. Science 217, 1214-1222. Yang, A.-S., Gunner, M.R., Sampogna, R., Sharp, K., and Honig, B. (1993). On the calculation of pKas in proteins. Proteins: Structure, Function, and Genetics 15, 252-265.
Chapter 7
Protein Hydrodynamics STEPHEN E. HARDING
Abstract Introduction Hydrodynamic Techniques Molar Mass (Molecular Weight) and Quaternary Structure Gel Filtration and Size Exclusion Chromatography Dynamic Light Scattering (DLS) Sedimentation Velocity in the Analytical Ultracentrifuge Sedimentation Equilibrium Shape Measurement Modelling Strategies: Spheres, Ellipsoids, Beads, and Bends Intrinsic Viscosity Sedimentation Velocity and Dynamic Light Scattering Use of Concentration Dependence Parameters, Combined Shape Functions, and the Radius of Gyration Rg Measurement and Use of Rotational Hydrodynamic Shape Functions: Fluorescence Depolarization Decay Some Computer Programs for Conformational Analysis
Protein: A Comprehensive Treatise Volume 2, pages 271-305 Copyright © 1999 by JAI Press Inc. All rights of reproduction in any form reserved. ISBN: 1-55938-672-X
271
272 272 273 273 274 277 282 286 291 292 294 296 297 298 301
STEPHEN E. HARDING
272
ABSTRACT This article provides a pointer for the non-specialist to the various hydrodynamic methodologies available for the characterisation of the size, conformation in dilute solution and interaction properties of proteins. The virtue of combining data from different techniques is stressed, particularly in connection with conformation analysis and its associated uniqueness and hydration problems.
INTRODUCTION Hydrodynamics provides the protein scientist with a powerful array of methodologies for investigating the mass, conformation, and interaction properties of proteins in solution conditions—conditions in which they largely function in vivo. These methods can also provide an important supporting role to the so-called "high-resolution" structural probes of X-ray crystallography and nuclear magnetic resonance. In the case of nuclear magnetic resonance, because of the high concentrations of mass/volume required to give satisfactory spectra, simple sedimentation velocity or equilibrium runs in the analytical ultracentrifuge can provide vital checks against any self-association behavior that can give rise to misinterpretation of the chemical shift or related spectra. Hydrodynamic methods are generally rapid and nondestructive: these particular features have not escaped the notice of people such as molecular biologists, who often have only very small amounts of material available. They can provide early "low-resolution" information on a macromolecular structure prior to detailed crystallographic or high-resolution nuclear magnetic resonance analysis. Or conversely, they can provide the finishing touches refining a crystallographic model to account for dilute solution behavior, especially in terms of wtermolecular interaction phenomena (Schachman, 1989). The delicate intramolecular relationships between subunits of a multienzyme complex can also be explored. In work now almost considered classical, H. K. Schachman and coworkers (Schachman et al, 1984) showed, using a combination of high-precision analytical ultracentrifuge measurements with the tools of molecular biology (production of point mutants) how such interactions in aspartate transcarbamoylase produce powerful allostery. There have been many classic reviews on the application of hydrodynamic probes. Despite its age, C. Tanford's book (1961) is still regarded by many as the authority on the subject, although the subject has advanced considerably since that time particularly in terms of molecular weight and molecular weight distribution analysis, analysis of interaction parameters, and hydrodynamic conformation modelling (tri-axial ellipsoids, bead models, flexible particle analysis, etc.). The purpose of this article is thus to attempt to indicate some of the "late 1990's" state-of-the-art of hydrodynamic methodology for the investigation of macromolecular conformation in dilute solution. This article, with the general protein scientist in mind, will
Protein Hydrodynamics
273
not give a comprehensive review of the theory, experimentation, and applications of all hydrodynamic methodologies, but aims to provide a pointer to the various methodologies, and for each of the two classes of hydrodynamic measurement— mass and shape analysis—it will focus on certain techniques in more detail than others (this is merely a result of the particular expertise of the author). For example, for mass analysis, we focus on gel filtration and size exclusion chromatography (including on-line coupling with multiangle laser light scattering), dynamic light scattering, sedimentation velocity, and sedimentation equilibrium in the analytical ultracentrifuge. For shape measurement, we focus on sedimentation velocity and dynamic light scattering again, together with intrinsic viscosity, steady-state fluorescence depolarization, and the use of concentration dependence parameters, combined shape functions, and the radius of gyration. The treatment given here is by no means comprehensive, but certain key follow-up references will be given.
HYDRODYNAMIC TECHNIQUES By "hydrodynamic" (Greek for "water-movement") techniques, we mean any technique involving motion of a macromolecule with, or relative to, the aqueous solvent in which it is dissolved or suspended. This therefore includes not only gel filtration and size-exclusion chromatography, viscometry, sedimentation (velocity and equilibrium), and rotational diffusion probes (fluorescence anisotropy depolarization and electric-optical methods) but also "classical" and "dynamic" light scattering that both (even "classical") derive from the relative motions of the (macromolecular) solute in relation to the solvent. Although this definition technically also includes electrophoretic methods, these will not be considered here. Let it suffice to say here however that electrophoretic methods, besides being powerful tools for separation, purification, and identification of proteins, can also, with "SDS" methodology, be used to provide an estimate of polypeptide molecular weight. Careful use of cross-linking agents can also give an indication of quaternary structure, although correct application of other hydrodynamic methods give a more precise picture. This article therefore considers the hydrodynamic determination of molecular weight, or "molar mass", and quaternary structure (subunit composition and arrangement, self-association phenomena, and polydispersity). We will also consider the measurement of protein conformation in dilute solution.
MOLAR MASS (MOLECULAR WEIGHT) AND QUATERNARY STRUCTURE For an unglycosylated polypeptide, a value to ±1 g/mol can be obtained from sequence information or from mass spectrometry . A similar precision cannot be obtained for glycosylated proteins because of polydispersity deriving from the variability of a cell's glycosylation process. Many proteins—and glycoproteins—
274
STEPHEN E. HARDING
contain more than one noncovalently linked protein chain, particularly at higher concentrations, and important roles of hydrodynamic methods for mass analysis in protein chemistry are to give the molar mass of the "intact" or "quaternary" structure and to provide an idea of the strength of binding of these noncovalent entities through measurement of association constants. Gel Filtration and Size Exclusion Chromatography
The simplest hydrodynamic method for measuring molar mass is gel filtration (Ackers, 1975), commonly referred to as "gel permeation chromatography" or now "size exclusion chromatography" since the chemical intertness of the separation medium is assumed. This was originally conceived as a method for the separation and purification of macromolecules but has developed over the years in its "calibrated" form as a very popular method for measuring protein molar masses both in native and dissociative conditions. The separation medium of this method is a crosslinked gel. Traditionally, this has been made by using cross-linked polysaccharide or polyacrylamide beads and allowing them to swell in water; this is then packed into a glass or metal walled column, which is then equilibrated with the buffer in which the macromolecules to be separated are dissolved. Control of the degree of crosslinking will dictate the separation range of the gel: looser gels will separate larger molecules. Proper packing of columns requires some skill, and the user manuals as supplied by the commercial manufacturers are usually very comprehensive. The availability of prepacked, metal-walled columns for use in so-called "high-pressure" or "high-performance liquid chromatography" (HPLC) with positive pressure applied upstream of the column to accelerate the separation process makes the measurement particularly attractive for protein chemists. Gel filtration or size exclusion chromatography depends on the principle that some of the space inside the gel particle is available to smaller molecules but unavailable to larger molecules that are excluded. Thus, when a solution is applied to the top of a properly packed gel column, only the dead space between gel particles is available to the excluded molecules, which therefore come off first when "elution" is commenced (addition of the buffer at a continuous rate, or equivalently with HPLC, injection of the solution into an already continuously running buffer system). The excluded molecules—the larger molecules—will thus have a smaller elution volume, Ve, and will elute first from the column. Smaller macromolecules, having progressively more and more space available to them as molar mass decreases, are accordingly eluted only at higher values of Ve. "Biggest come off the column first" is the rule of thumb for size exclusion chromatography. The separation is sometimes given in terms of the partition coefficient, Kav as defined by Ve = V0 + K a v (V t -V 0 ) (1) where V0 and Vt are the void volume and total volume of the column, respectively, determined from separate elutions using solute species having partition coefficients
Protein Hydrodynamics
275
of 0 (totally excluded) and 1 (non-excluded), respectively. Elution of proteins as they emerge from the column is monitored by the use of a spectrophotometer set for either 280 nm in the uv (trp and tyr residue absorption) or the more sensitive far-uv (210-230 nm-peptide bond), provided the buffer is reasonably transparent in the selected region. Reagents like ATP, azide, and so forth in buffers can cause serious problems for detection by causing an effective uv blackout. In these cases, use of a differential refractometer instead of a spectrophotometer is appropriate. Highly sensitive differential refractometers are now available, which are now arguably more preferable generally as the detection method of choice. The broadness of a peak eluting from a column does not necessarily mean the component is polydisperse: it more probably is a likely result of diffusion effects. Empirically, the volume at which a protein elutes Ve and its molar mass M are related by the logarithmic expression (Ackers, 1975) Ve = A-Blog 1 0 M
(2)
where A and B are properties of the column. This equation is valid over the fractionation range of the gel and forms the basis of calibrated gel chromatography (Ackers, 1975): To obtain the molar mass of a protein molecule or mixture of (a)' H3.0
230 210
> 4> g 9
170
150 130
W
2.5
Chymotrypsinogen
190
Ovolbumin Molott dthydrogtnost £*. ccV phosphatase .Glyctraldthydt 3phosphalt dthydrogtnost — Loctate dthydrogtnas* Fttuin w^^ Strum albumin dimtr—•TJ^.* Aldolost Ytast olcohoJ dthydrogtnost-—*""*^\«"-»--Fumorost Catruloplosmin 0 = \ " > * _ Cotolost y- Globulins-^ Apoftrritin /?-Phycotry1hrin Goioctosidost a-Conarochin ••-•-Ftrritin Fibrinogen — Urtasca-CrystollinBtut dtxtron' l
Ovomucoid — ^ - o Bovint strum olbumin Transferrin
K>4
O
2.0
>
1.5
1.0
10'
Molar mass, M (g/mol)
(continued)
Figure 1. Size exclusion chromatography, (a) SEC logarithmic calibration plot for proteins eluting from a Sephadex G-200 column. Reproduced with permission from Andrews (1965). (b) SEC elution volume/ molecular weight relation obtained directly from SEC/MALLS for a glycoprotein (pig gastric mucin, = 8 0 % glycosylated) (adapted from Jumel et a l v 1996). (c) Molar mass distribution corresponding to (b).
276
STEPHEN E. HARDING
(b) o
E DA
C/5
c/i
a
B u
s 10.3
11.0
Elution volume, V e (ml) (c)
C JO
WD
i.0e+6
1.0e+7
1.0e+8
Molar mass, M (g/mol) Figure 7. Continued
molecules, the column is first calibrated by the use of standard proteins or "markers" of known molar mass. Linear regression analysis is then used to evaluate A and B, and hence from the measured value of Ve of the unknown protein, M can be found. The calibration can only be applied within the gel's fractionation range, which will depend on the pore size (Figure la). Fractionation ability is normally enhanced by
Protein Hydrodynamics
277
running differing gel columns in series, a practice common with HPLC systems because of the much shorter elution times. Equation 1 assumes the fractionation is based on the size-exclusion principle alone. Separation mechanisms not governed by the size of the molecules will tend to decouple the molecular size-migration velocity relation and the experimental elution profile will not reflect differences in size (Barth, 1980). Equation 1, which fails also outside the fractionation range of the gel, works only for molecules of similar shape and conformation. Thus calibration using globular protein standards would be inappropriate for fibrinogen and muscle proteins like myosin and titin (asymmetric) and also heavily glycosylated glycoproteins. The theory behind equation 1 is not rigorous, but for globular proteins at least it seems to represent the data very well. For linear macromolecules of limited stiffness, there appears to be growing acceptance that the separation is more a logarithmic function of the hydrodynamic volume of a macromolecule (=M-[ri] where [r|] is the intrinsic viscosity) and its corresponding hydrodynamic or "effective" radius rH, culminating in a proposal for a "universal calibration" (Dubin and Principi, 1989). This may be more appropriate for proteins in denaturing solvents such as proteins in the presence of mercaptoethanol (disulfide bond breaker) and 6M GuHCl; for these substances, wider pore gels (e.g., sepharose) are a more appropriate separation medium. These calibration problems can be avoided completely by coupling an absolute molar mass detector (a light scattering photometer) downstream from the column (Wyatt, 1992). This coupling, called "SEC/MALLS," is particularly valuable for the characterization of polydisperse heavily glycosylated protein systems such as mucus glycoproteins since it provides the elution volume to weight-average molar mass relationship without recourse to calibration standards (Figure lb) and also provides the molar mass, or for a heterogeneous system, the molar mass distribution (Figure lc) and its associated molar mass averages (number average, Mn, weight average, Mw, and z-average, Mz). The coupled light scattering and refractive index detectors are so sensitive that only low concentrations are required and problems through thermodynamic nonideality are usually negligible. Dynamic Light Scattering (DLS)
Although the light scattering photometer described in the SEC/MALLS application above (often described as "static" light scattering) is not thought of as a classical "hydrodynamic" probe (although technically it is derived from motions of macromolecules relative to solvent), the technique of dynamic light scattering has without doubt a firm hydrodynamic basis and now appears to be the method of choice for the measurement of translational diffusion coefficients. In addition, via an approximation or by combination with sedimentation measurements, (see below) this method also provides an estimate for the molar mass. The appearance of simpleto-use, fixed- angle (90°) dynamic light scattering photometers has made dynamic
STEPHEN E. HARDING
278
light scattering an increasingly popular tool amongst protein chemists. After certain assumptions and approximations, largely involving an assumed spherical shape, surprisingly reliable estimates for the molar mass of globular proteins have been obtained (Claes et al., 1992). When used in isolation, this method for molar mass measurement is, like gel filtration, a relative one, requiring a calibration using standard proteins of known molar mass. For asymmetric proteins like fibrinogen and myosin, the single-angle approximation fails, but extraction of molar mass and related parameters is still possible if multiangle instruments are used and the primary parameter, which comes from dynamic light scattering measurements, the translational diffusion coefficient D (cm2 s -1 ), is combined with results from sedimentation analysis in the analytical ultracentrifuge (see below). For a recent comprehensive treatment of the technique, the reader is referred to Brown's book (1995), and for a more introductory text, Schmitz (1990) and an article by Johnson (1984). Although the theory is complex, the principle of dynamic light scattering experiments is simple and is based on the high intensity, monochromaticity, collimation, and coherence of laser light. Laser light is directed onto a protein solution in a controlled temperature bath, and the intensity at either a single or multiple angles recorded using a photomultiplier/photodetector system. The
(a)
PC Store
Correlator
Amp Disc
Figure 2. Dynamic light scattering, (a) Experimental set-up. (b) Normalized autocorrelation decay plot for the protein assembly Dynein (in 40 m M NaCI) D 2 0 w = 1-1 x 10" 7 c m 2 s _ 1 ; M (from equation 10) = 2.5 x 10 6 g/mol (adapted from Wells eta I., 1990) (c)"MHKS" double-logarithmic calibration plot of rH versus M : (1) thyroglobulin; (2) apoferritin; (3) IgG; (4) yeast alcohol dehydrogenase; (5) hexokinase; (6) amyloglucosidase; (7) horse alcohol dehydrogenase; (8) transferrin; (9) bovine serum albumin; (10) hemoglobin; (11) hexokinase subunit; (12) ovalbumin; (13) carbonic anhydrase; (14) chymotrypsinogen; (15) myoglobin; (16) lysozyme; (17) ribonuclease A. Reproduced with permission from Claes et al. (1992).
Protein Hydrodynamics
279
intensities recorded will fluctuate with time because of Brownian diffusive motions of the macromolecules; this movement causes a "Doppler" type of wavelength broadening of the otherwise monochromatic light incident on the protein molecules and the beating between waves of different but similar wavelength causes the intensity fluctuation. How rapid the intensity fluctuates (ns-|is time intervals)
(b)
0
10
20
30
40
Channel number
(C)
X
u W>
0.2 H
1
r-
log 10 M Figure 2. Continued
50
60
STEPHEN E. HARDING
280
depends on the mobility or diffusivity of the protein molecules. A purpose-built computer called an autocorrelator, as indicated by its name, "correlates" or interprets these fluctuations. It does this by evaluating a "normalized intensity autocorrelation function," g(2), as a function of "delay time", "x (ms-^is)". The decay of the correlation, g(2)(x), as a function of x , averaged over longer time intervals (usually = minutes) can then be used, by an interfaced PC (or the equivalent) to obtain D. (Larger and/or asymmetric particles that move more sluggishly will have slower intensity fluctuations, slower decay of g(2) (x) with x, and hence smaller D values compared to smaller and/or more globular particles). The delay time x is itself the product of the "channel number" b (taking on all integral values between 1 and 64 or up to 128 or 256 depending on how expensive the correlator is) with a user-set "sample time", xs, (typically ~ 100 ns for a rapidly diffusing low molar mass [M ~ 20000 g/mol] enzyme, and increasing up to around milliseconds for microbes). In the past, xs was selected by trial and error, but now modern data-capture software usually does this automatically. For spherical particles, a single term exponential describes the decay of T with x: g< 2) (x)-l=e" Dk2T
(3)
where k is the Bragg wave vector whose magnitude is defined by k={47in/A}sin(8/2)
(4)
and where n is the refractive index of the medium, 0 is the scattering angle, and X (cm) is the wavelength of the incident light. Equation 3 can be reasonably applied to quasispherical particles like globular proteins or spheroidal protein assemblies (Figure 2b). Fixed-Angle (90°) DLS Photometer
For globular proteins and spheroidal assemblies, application of equation 3 at only a single fixed angle is usually sufficient. Low angles are usually avoided because they magnify problems due to any contamination with dust or other supramolecular particles and thus an angle of 90° is normally used. For a given laser power at a given protein concentration, the smaller the protein the lower the intensity of scattered light and hence the longer the averaging required to give a sufficient signal. A commercial instrument is available based on this single fixed angle principle (Claes et al., 1992). To obtain molar mass information from D, a calibration curve of log D versus log M is produced (this is known as an "MHKS" {Mark-HouwinkKuhn-Sakurada} relation; for example, see Harding, 1995) based on globular protein standards, and the approximation is made that this relation holds for the protein whose molar mass is being sought. Figure 2c shows such a calibration plot (The D values have been converted to hydrodynamic radius values, see below).
Protein Hydrodynamics
281
Other approximations and practical requirements with the operation of this type of fixed-angle instrument have to be made: 1. Solutions have to be as free as possible from dust and supramolecular aggregates. This requirement is met by injection of the sample into the (scrupulously clean) scattering cell via a milliporefilter(s)of appropriate size (0.1-0.45 Jim). 2. The diffusion coefficient is a sensitive function of temperature and the viscosity of the solvent (also sensitive to temperature) and the log D versus log M relationship must correspond to the same temperature. 3. The diffusion coefficient measured at a single concentration is an apparent one, D , because of nonideality effects (finite volume and charge). These effects become vanishingly small as the concentration c—>0. The approximation which is usually reasonable for proteins, is made that D ~ D or that nonideality effects are the same as for the calibration standards. Despite these approximations, diffusion coefficients and molar masses obtained in this way with these fixed-angle instruments have been remarkably reliable. For nonglobular proteins, however both the log D versus log M calibration becomes invalid and also equation 3 no longer applies: an instrument with a multiangle facility must then be resorted to. Multiangle
Instruments
Measurements using multiangle equipment are more time-consuming and the instrumentation larger and more expensive. Data analysis is also more complicated. Equation 3 no longer applies largely because of the added complication of rotational diffusion effects. These effects vanish however as the scattering angle 9 —-» 0. It is therefore possible to use equation 3 in terms of an apparent diffusion coefficient D with contributions from both concentration and rotational diffusion affects. D is measured at several angles and extrapolated back to zero angle to give D if concentration effects are negligible. However, if concentration dependence affects are suspected, then a double extrapolation can be performed on the same plot (called a "Dynamic Zimm plot") of D (or the equivalent autocorrelation function) to zero angle and zero concentration (Burchard, 1992). The common intercept gives the "ideal" (in a thermodynamic sense) diffusion coefficient, D°. Because this quantity is not only an intrinsic property of the protein but also of the viscosity T) (poise) and temperature T (K) of the buffer, it has to be corrected to standard conditions (viscosity of pure water at 20 °C, r)20 w) either before or after the extrapolation (van Holde, 1985) as shown in the following. ^O,W = D°-{TI/TI 2 0 ) W }-{T/293.15}
(5)
The size of a protein, as represented by its equivalent hydrodynamic radius rH, is related to Di^ W by the Stokes equation according to the relation rH = kBT/(67iTi20>wD^w)
(6)
282
STEPHEN E. HARDING
where kB is Boltzmann's constant. To obtain an absolute measure of molar mass, M, of a protein from D°20 w without assumptions concerning the shape of the protein requires combination with the sedimentation coefficient from the analytical ultracentrifuge, as described below. Some modern software attempts to evaluate M directly from the diffusion coefficient; this should be treated with some caution. For multiangle measurements, preferences vary in terms of the type of cuvets used. Square cuvets are optically more reliable, but cell corners obviously prohibit some scattering angles. Cylindrical cuvets, if used, should be of the wide diameter type (>2 cm) to avoid internal and stray reflections. Scrupulous attention to sample and cuvet clarity is mandatory, particularly for macromolecules of M< 100000 g/mol, which give low scattering signals, and also if low angles are employed where the effects of supramolecular contaminants are at their maximum: special cuvet filling arrangements are used for clarification purposes (Sanders and Cannell, 1980). The angular extrapolation of D can provide an estimate for the rotational diffusion coefficient, albeit to a lower precision than conventional methods (fluorescence depolarization, electric birefringence). If the protein is polydisperse or self-associating, the logarithmic plot of the type shown in Figure 2b will tend to be curved, and the corresponding diffusion coefficient will be a z-average (Pusey, 1974) The spread of diffusion coefficients is indicated by a parameter known as the "Polydispersity Factor" (Pusey, 1974), which most software packages evaluate. Various computer packages are available from the commercial manufacturer for data capture and evaluation. In our laboratory, we prefer to capture the data in ASCII format using the data capture software of the commercial manufacturer and then to use our own in-house routine "PROTEPS" (Harding et al., 1997) or the evaluation of diffusion coefficients and polydispersity factors. More advanced routines are available, including "CONTIN", which was designed for the study of heterogeneous systems by going beyond the use of polydispersity factors and inverting the autocorrelation data directly to give distributions of particle size. These methods have been recently reviewed (Johnsen and Brown, 1992; Stepanek, 1993). Dynamic light scattering is particularly valuable for the investigation of changes in macromolecular systems as long as the timescale of changes is of the order of minutes or hours, and not seconds or lower (Harding, 1986). Finally, it is worth pointing out that dynamic light scattering also provides a useful tool for monitoring electrophoretic mobilities (Langley, 1992) and commercial instrumentation is available for this purpose. Sedimentation Velocity in the Analytical Ultracentrifuge
Combination of the sedimentation coefficient, s, from sedimentation velocity with the diffusion coefficient, D, from dynamic light scattering gives an absolute value for the molar mass of a protein without assumptions over conformation. This method for molar mass measurement was given by T. Svedberg (see Svedberg and Pedersen, 1940).
Protein Hydrodynamics
283
The basic principle of the technique is as follows: a solution of the protein is placed in a specially designed sector-shaped cell with transparent end windows. This in turn is placed in an appropriately balanced rotor and run in high vacuum at the appropriate speed (typically = 50000-60000 rev/min for a protein of molar mass 10000-100000 g/mol, lower speeds for larger molecules). A light source positioned below the rotor transmits light via a monochromator or filter through the solution and a variety of optical components. The moving boundary can then be recorded at appropriate time intervals on photographic film, on chart paper, or as digital output fed directly into a PC. Measurement of the rate of the movement of the boundary (per unit centrifugal field) enables evaluation of the sedimentation coefficient. (For an introduction, see van Holde, 1985; for more detail, see two recent books: Harding et al., 1992a; Schuster and Laue, 1994). There are three principal optical systems which can be employed: (i) absorption optics (in the range 200-700 nm), (ii) "Schlieren" refractive index gradient optics, and (iii) Rayleigh interference optics. The simplest system is the absorption system and the only commercially available analytical ultracentrifuge currently available is based around this (we will describe the operation of this here). Use of the other optical systems requires more specialized knowledge and the interested protein chemist needs really to consult an expert. Use of an Analytical Ultracentrifuge with a Scanning Absorption Detection System and On-line Data Capture to a PC
Optics
Double sector cells are used with the solution (0.2-0.4 ml) in one sector and the reference buffer or solvent in the other, the latter filled to a slightly higher level to avoid complications caused by the signal coming from the solvent meniscus. The scanning system subtracts the absorption of the reference buffer from the solution. Electronic multiplexing allows multiple hole rotors to be used so that samples can be run several at a time. In Figure 3a, examples of sedimenting boundaries recorded using absorption optics are shown. Fig 3a (top) is for a highly purified preparation of an enzyme (methylmalonyl mutase). Fig. 3a (lower) is for a heterogeneous preparation of a DNA-binding protein (Pf 1) with a macromolecular component and a fast moving aggregate; the virtue of the technique for assaying the purity of a preparation (number and asymmetry of boundary/ boundaries for a given scan) can be directly seen. Although commercial software is available for identifying the center of the sedimenting boundary (strictly the "second moment" of the boundary is more appropriate; practically there is no real difference), in practice the simplest way is (i) to plot out the boundaries (recorded at appropriate time intervals) using a high resolution printer or plotter and to graphically draw a line through the user-identified boundary centers and then (ii) use a graphics tablet to recapture the central boundary positions as a function of radial position. Computer routines such as XLA-VEL (Colfen and Harding, unpublished) yield the sedimentation coefficient and a correction to the loading concentration for average radial dilution during the
STEPHEN E. HARDING
284
run (caused by the sector shape of the cell channels). Other routines are available based on the total concentration distribution such as SVEDBERG (Philo, 1994) and measurement of the apparent distribution of sedimentation coefficients, g(s) such as DCDT (Stafford, 1992). The sedimentation coefficient, s, equals rate of movement of boundary/ unit centrifugal field, that is s = (dr/dt)/co2r
(7)
where r is the radial position of the boundary at time t and co is the angular velocity in rads/sec (= rpm x 2TI/60). For a small globular protein of sedimentation coefficient of about 2 Svedbergs (S, where 1 S = 10"13sec), a rotor speed of 50000 rpm will give a measurable set of optical records after some hours. For larger protein systems (e.g. 12S globulins or 30S ribosomes) speeds of <30000 rpm are appropriate. The standard temperature at which sedimentation coefficients are quoted is now 20.0 °C (sometimes 25.0 °C). If the protein is thermally unstable, temperatures down to around 4°C can be used without difficulty. The concentration used depends on the extinction coefficient of the protein. The lower the protein concentration the better, since it minimizes problems of nonideality. For proteins of average extinction at 280 nm (=500 ml g"1 cm-1), concentrations as low as 0.2 mg/ml are possible with the standard 12 mm optical path length cells. This limit can be pushed even lower if the peptide bond wavelength is used (210-230 nm) and the buffer is transparent. For absorbances greater than 3, shorter path length cells need to be employed instead (minimum = 3 mm: below this, cell window problems become significant), or "off-maximum" wavelengths used (with caution), or more desirably, a different optical system used (interference or Schlieren). For each concentration used, the sedimentation coefficient, s, is corrected to standard conditions of buffer/solvent density and viscosity (water at 20.0 °C): s
20,w = s-{Tl/Tl 20tW }-{(l-vp 20 w )/(l-vp)}
(8)
where p is the density of the solvent. Knowledge of a parameter known as the "partial specific volume", v (essentially the reciprocal of the anhydrous macromolecular density), is needed; this can usually be obtained for proteins from amino acid composition data (Perkins, 1986) or measured with a precision density meter (Kratky et al., 1973). Typically, v = 0.73ml/g for proteins. Extrapolation to Zero
Concentration
As with D 20 , s20 w is plotted versus c (the latter corrected for radial dilution) and extrapolated (usually linearly) to zero concentration (Figure 3b) to give a parameter, s 20 w w hi c n can be directly related to the factional properties of the macromolecule (the so-called "frictional ratio") and from which size and shape information can be inferred. (If the protein is very asymmetric or solvated, plotting l/s 20 w versus c generally gives a more useful extrapolation). The downward slope of a plot of s20 w
Protein Hydrodynamics
(a)
(b)
Figure 3. Sedimentation velocity in the analytical ultracentrifuge using scanning absorption optics, (a) Sedimentation "diagrams", Methylmalonyl mutase, c=0.7 mg/ml. Monochromator wavelength = 295 nm; scan interval 9 min, rotor speed 44000 rev/min, temperature = 20.0 °C, measured s20 = (7.14±0.04)S. (b) Sedimentation diagrams, Gene 5 DNA binding protein, c= 0.7 mg/ml. Monochromator wavelength = 278 nm; scan interval 8 min, rotor speed 40000 rev/min, temperature = 20.0 °C, s 20 w = ( 3 5 - 5 ± 1 - 4 ) s ( faster boundary) and (2.6±0.1)S (slower boundary), (c) Sedimentation coefficient s20 versus concentration plot for an antibody (rat IgE). s20 w = (7.92±0.06)S
STEPHEN E. HARDING
286
versus concentration is a result of nonideality behavior and is characterized by the "Gralen" parameter k s in the equation s2o,w = 4 > . w ( l - k s c ) <9> k s , which depends on nonideality effects of the system, will depend o n the size, shape, and charge on the protein. If the solvent used is of a sufficient ionic strength, I, then these charge effects can be suppressed. The molar mass, M, can then be found by combination of s^o w with D^o w using the Svedberg equation (Svedberg and Pedersen, 1940): M = {s^,w^V}-{RT/(1-vp20.w)}
(10)
An accurate estimate for v as described above is normally required, because, for proteins, errors are triplified; for example, an error of ± 1 % in v results in an error of ± 3 % in M. This means that care has to be made if the protein is glycosylated since the v of carbohydrate is typically = 0.6 ml/g. For a heterogeneous system, s ^ w will be a weight average and D®0 W will be a z-average; the M calculated will also be a weight average (Pusey, 1974) thus distinguishing it from molar mass obtained by osmometry (see Tombs and Peacocke, 1974), which yields a number average. A further estimate can be obtained by combining s^0 w simply with k s (Rowe, 1977) M = (67iTi20tWs°20 W ) L 5 {(3v)/47i).[(ks/2V) - (v/v-)] } 0 5
(11)
where v s is a specific volume allowing for hydration of the protein, a n d since the ratio (v s /v) is usually small in comparison with (k s /2v), an approximate estimate normally suffices. This method has given reliable values for standard protein molecules of known molar mass (Rowe, 1977). ks itself is a valuable parameter for shape measurement as is discussed below. The form of the concentration dependence can also be used as an assay for self-associating systems (Rowe, 1977), although sedimentation equilibrium methods (see below) are usually superior. Sedimentation Equilibrium The "sedimentation-diffusion" method (Equation 10) for giving molar mass, although absolute, is rather inconvenient in requiring two sets of measurements. A simpler method is to use the analytical ultracentrifuge by itself with the technique known as sedimentation equilibrium, and it is probably the method of choice for molar mass determination of intact protein assemblies and particularly for the investigation of interacting systems of proteins (Schachman, 1989). The same instrument and optical system(s) for sedimentation velocity are used, t h e principal differences being (i) the much lower rotor speeds employed, (ii) the longer run times, and (iii) the shorter solution (and buffer) columns in the ultracentrifuge cell-hence the smaller amount of material required.
Protein Hydrodynamics
287
Sedimentation equilibrium, unlike sedimentation velocity, gel filtration, and dynamic light scattering, is not a transport method. In a sedimentation equilibrium experiment, the rotor speed is chosen to be low enough so that the forces of sedimentation and diffusion on the macromolecular solute become comparable allowing an equilibrium distribution of solute to be attained. This equilibrium can be established after a period of 2 to 96 hours depending on the macromolecule, the solvent, and the run conditions. Since there is no net transport of solute at equilibrium, the recording and analysis of the final equilibrium distribution (Figure 4) will give an absolute estimate for the molar mass and associated parameters since frictional (i.e., shape) effects are not involved. In this description, we again, for simplicity, refer only to the absorption system, because of its simplicity and availability, for recording the distribution of solute in the ultracentrifuge cell—this time an equilibrium distribution. The most accurate method is in fact the interference system, but this requires considerable more expertise to operate correctly (the reader is referred to references Van Holde, 1985; Harding et al., 1992a; Schuster and Laue, 1994.) The concentration and volume requirements for the macromolecular solute depend more critically, compared to sedimentation velocity, on the extinction coefficient of the protein. Like sedimentation velocity and dynamic light scattering, the lower the protein concentration the better, since it minimizes problems of thermodynamic nonideality. At higher concentrations (necessary if possible associative phenomena are being investigated - such as at the concentrations used for NMR measurements), the limitation is the Lambert Beer law. The proportionality c °c absorbance (A) fails above absorbances of about 1.4 to 1.5. For concentrations of 1 mg/ml and above, shorter path length cells need to be employed or an ultracentrifuge with Schlieren optics employed. Volume requirements are lower than for sedimentation velocity: generally 0.1 to 0.2 ml. The longer the column, the greater the precision and the more information that can be extracted. The shorter the column, the quicker equilibrium can be reached. Experimental times can be long. For molecules of M< 10000, <24 h are required; large, slower diffusing molecules take 48 to 72 h, although for the latter, time to equilibrium can be decreased by initial "overspeeding", that is, running at higher speed for a few hours before setting to the final equilibrium speed. It may, in some applications, be desirable to use shorter columns (as low as 0.5 mm); although the accuracy of the molar masses will be lower, this "short column" method offers the advantage of fast equilibrium (few hours) (Correia and Yphantis, 1992), which may be important if many samples need to be run and/or the macromolecule is relatively stable. As with sedimentation velocity, a temperature of 4 °C can be used without difficulty. If scanning absorption optics are used, equilibrium patterns such as in Figure 4 can be read directly into an attached PC. As with sedimentation velocity, cells can be run multiply in multihole rotors and electronically multiplexed. In addition, special multichannel cells containing three solution/solvent pairs can be used, and this is illustrated in Figure 4. So for a four-hole ultracentrifuge rotor (with 1 hole
STEPHEN E. H A R D I N G
288
needed for the counterpoise with reference slits for calibrating radial positions in the cell), nine solutions can be run simultaneously. Before interpretation in terms of molar mass, a baseline is normally required. After the final equilibrium pattern has been recorded (equilibrium checked by comparing scans separated by a few hours), the rotor is run for a short time at a higher speed (up to 60000 rev/min or the upper limit for a particular centerpiece) to deplete the solution—or at least the meniscus region—of solute: the residual absorbance gives the baseline correction (absorbance of nonmacromolecular species). This is not so easy with small proteins whose equilibrium speed will be quite
Radius, r (cm) Figure4. Sedimentation equilibrium profiles for P-lactoglobuI'm B. Absorption optics, wavelength = 280 nm. Rotor speed = 15000 rev/min, temperature = 20.0 °C. A multichannel cell (12 mm optical path length) was used allowing three solution/solvent pairs with =0.12 ml in solvent channels, =0.10 ml in solution channels. Inner profile: loading concentration c = 0.1 mg/ml; middle:0.2 mg/ml; outer = 0.3 mg/ml. Because of restrictions from the Lambert-Beer law, with the outer channel, only absorbances <1.5 could be used. This difficulty could be offset by using a higher wavelength. With the inner channel, the signal could be increased by using far-uv optics (210-230 nm).
Protein Hydrodynamics
289
high anyway: careful dialysis of solution versus the reference solvent before the run (and use of the dialysate as reference) may be necessary. The average slope of a plot of In A versus r2, the square of the radial distance from the center of the rotor, will yield the molar mass: M = (dlnA/dr2) x 2RT/( 1 - vp) a)2
(12)
At finite concentrations, this will be an apparent molar mass (because of the effects of thermodynamic nonideality; see below), but for macromolecular systems of M< 100,000 g/mol in aqueous solvents of reasonable ionic strength (0.05 M and above), these effects are small at loading concentrations of 0.5 mg/ml and less: in these cases, it is reasonable to assume M = M . If the protein solution is heterogeneous (containing interacting or noninteracting species of different molar mass), then the plot of In A versus r2 will be curved upwards. This situation occurs with self-associating systems and heavily glycosylated protein systems such as mucus glycoproteins. In this case, the data can be treated in one of two ways: (i) an average slope is obtained. This yields, as with equation 12, the weight average molar mass, Mw. For strongly curving systems or for systems where the cell base is not clearly defined, a procedure that uses a function known as M* (Creeth and Harding, 1982; Harding et al., 1992b) is useful for this purpose; (ii) local slopes using a sliding strip procedure (Teller, 1973) along the In A versus r2 curve can be obtained to give what is called apparent "point" weight average molar masses, M (r), as function of either radial position (or the equivalent local concentration or absorbance). This procedure is particularly useful for the investigation of self-association phenomena and other types of heterogeneity and also provides a method for extracting the z-average molar mass: M
z.aPP = (Mw(rb)-A(rb) - Mw(ra)-A(ra) }/[A(rb)-A(ra)]
(13)
where (ra, rb) are the radial positions of the solution meniscus and cell base respectively, and M —> Mz as the concentration (in absorbance units, A) —» 0. The ratio M/M w can be used as an index of the heterogeneity of the sample, and, for noninteracting systems, is a measure of the inherent polydispersity of a system; this is particularly relevant to the study of heavily glycosylated systems, for example. If the system is self-associating or involved in "heterologous" association (i.e., complex formation phenomena), then either the A(r) versus r plot (Figure 5a), the M (r) versus A(r) plot, or a plot of M versus c for different loading concentrations, c, can be used to assay for the stoichiometry and strength of an interaction. There are several commercial software packages available: see Colfen et al. (1997). Assays are also available for distinguishing between a self-association from noninteracting mixtures (Roark and Yphantis, 1969).
STEPHEN E. HARDING
290 (a)
U> 0.02 |
0.00 -0.02
J
1
i
.
1
1
,
1
<* ° O ,
°
,——1
1
1
7.1
Radius (cm)
(b)
0.200 0.195
0.170
-J
0.5
L.
1.0
1.5
-J
2.0
L_
2.5
c(r) mg/ml Figure 5. Analysis of sedimentation equilibrium data self-association analysis, (a) Self-association: Absorbance A(r) versus radial displacement (r) data for protein disulphide isomerase (PDI). Rotor speed = 12000 rpm, temperature = 4 °C, loading concentration, c = 0.4 mg/ml. Line fitted is for a reversible ideal dimerization, dissociation constant, Kd = 180 u.M (from Darby et al., 1997). (b) Thermodynamic nonideality: Plot of the reciprocal point (apparent) average molar mass M w (r) as a function of radial position, r, versus concentration, c(r), for turnip-yellow mosaic virus (TYMV) . M w (from extrapolation to zero concentration) = (5.8 ± 0.2) x 106 g/mol. Adapted from Harding and Johnson (1985).
Protein Hydrodynamics
291
For larger macromolecules (M > 100000) such as protein assemblies and heavily glycosylated systems and/or for more concentrated solutions, nonideality (through macromolecular exclusion and any unsuppresed charge effects) may become significant, and this will tend to cause downward curvature in the In A versus r2 plots: this can often obscure heterogeneity phenomena and the two effects (nonideality and heterogeneity can occasionally cancel to give a linear plot that can be misleading, a problem that can be avoided by running at more than one loading concentration). If the solution is not significantly heterogeneous, then a simple extrapolation from a single experiment of point (apparent) molar mass to zero concentration (absorbance) can be made in order to give the infinite dilution "ideal" value (in general, reciprocals are usually plotted; see Figure 5b). Alternatively, several sedimentation equilibrium experiments performed at different loading concentrations, c, and extrapolation of "whole cell" molar masses Mw to zero concentration are necessary. Insofar as modern computing packages are concerned, software currently available from the commercial manufacturer tends to require an assumed model prior to the analysis (ideal monomer, self-association, nonideal self-association, etc.). We find a general package, of use that does not require assumed models. This is MSTAR (Harding et al., 1992b), now available for PC (Colfen and Harding, 1997). This program evaluates M (using the M* function), M (r) versus r or A, and also Mz app(r), if the data is of sufficiently high quality. After these model independent analyses have been performed, resort can then be made to the more specialized packages (self-association, polydispersity, etc.).
SHAPE MEASUREMENT Hydrodynamic methods provide a relatively quick method to acquire average or "gross" conformation information about proteins and protein assemblies, and in some cases to give rather detailed representations, as for example for T-even bacteriophages and antibodies. Limited flexibility information is also possible. Although such information may seem to be "low-resolution" compared to the information possible from the powerful structural probes of x-ray crystallography and high-resolution NMR, it should be borne in mind that the latter are sometimes not applicable for the following reasons: (i) high enough aggregation-free concentrations necessary for high-resolution NMR may not be attainable for a given protein system or assembly; (ii) the protein or protein assembly may not be crystallizable, or molecular flexibility effects may obscure attempts to interpret electron density maps: the latter is the reason why crystallographers have had considerable difficulty in evaluating the structure of intact, immunologically active antibody molecules. In both these cases, hydrodynamic methods are particularly valuable (i) to monitor possible associative behavior at higher concentration (using any of the techniques above, particularly sedimentation velocity and equilibrium) and (ii) to
STEPHEN E. HARDING
292
provide conformation information of the protein or protein assembly in in-vivo solution conditions; this can be either in terms of an overall shape or in terms of refinement of a crystal structure of a protein or electron microscopic structure of a protein assembly (arrangement of subunits). A good example is the case of antibodies with a useful early attempt made by Gregory and colleagues (1987). M o d e l l i n g Strategies: Spheres, Ellipsoids, Beads, a n d Bends
Hydrodynamic representation of protein shape is in terms of models that progress in sophistication from a sphere to bead models (Figure 6). The simplest is the equivalent hydrodynamic (or "Stokes") sphere, of radius rH (cf. equation 6). The next step toward better representation is the ellipsoid of revolution, an ellipsoid with two equal axes of which there are two: the prolate ellipsoid (cigar shape) with two equal minor axes and the oblate ellipsoid (discoid) with two equal major axes, both characterized by the axial ratio a/b with the semiaxes a>b. In the limit of a » b , the prolate becomes a rod and the oblate a disc. The next step in sophistication is the general triaxial ellipsoid of semiaxes a > b > c and axial ratios {a/b, b/c}, which in the limits go to spheres {a/b = 1; b/c = 1}, oblate ellipsoids {a/b=l}, and prolate ellipsoids {b/c=l}, the latter two going to discs and rods respectively. Another extreme of the general ellipsoid is the tape (a » b » c). The final degree of sophistication is the bead model: many macromolecules such as antibodies and multisubunit proteins are difficult to represent by symmetric shapes like ellipsoids. (a)
ProtoU CMipsotd
O M a t t Ellipsoid
Figure 6. Hydrodynamic models for conformation, (a) Ellipsoids of revolution (adapted from Tanford, 1961). (b) General triaxial ellipsoids, (c) Bead models. (/) T-even bacteriophage in slow(s) and fast(f) forms (S^QW - 71 OS and 1020S, respectively). Modelled on sedimentation and diffusion coefficient data. From Garcia de la Torre (1989). (//) C1 complex from the complement system. Modelled on sedimentation coefficient and R data. From Perkins (1989). (//'/) Cyclic AMP receptor associated with 80-bp D N A . Only maximum bending of the D N A reproduces the measured rotational diffusion decay constant (from electric dichroism decay). From Porschke and Antosiewicz(1989).
Protein Hydrodynamics
293
(b)
Rod
2
a>*b-c
(c)
(0
(")
(iii)
Figure 6. Continued
Bead modelling (arrays of touching or overlapping spheres) allows very sophisticated shapes to be represented. A successful variant of this is bead-shell modelling, where the surface of the macromolecule is represented by beads. Filling strategies, however, such as those based on crystallographic coordinates, have sadly been shown (Carrasco, 1998) to be unreliable. As the degree of sophistication increases, the uniqueness problem also increases. What this means is that a model may be consistent with a particular measured hydrodynamic parameter such as a sedimentation coefficient s^ w or a radius of gyration R (from solution x-ray scattering or light scattering; see, for example, Van Holde, 1985) but so may other models. For example, a value for the sedimentation coefficient can correspond to one equivalent sphere, two ellipsoids of revolution, a line solution of triaxial ellipsoids, and almost an infinity of bead models. There is
STEPHEN E. HARDING
294
a further problem from ellipsoids upward: hydration or the degree of buffer/sol vent associated with (chemically bound or physically entrapped) by the protein - which also contributes to SSJQ W among other things and has to be either measured separately, assumed, or eliminated by combination of measurements. As the degree of sophistication in the model increases, there is a greater need for independent measurements (two for ellipsoids of revolution, three for triaxial ellipsoids to give a "unique" answer). Bead modelling, normally performed with at least two hydrodynamic measurements (popularly SSJQ W or D ^ w and Rg , although rotational probes have been used; Antosiewicz and Porschke, 1989; Porschke and Antosiewicz, 1989) is best used to refine a structure from crystallography or to select between certain plausible structures. A further refinement to bead modelling is in the modelling of molecular flexibility, the bending or "segmental flexibility" in the molecules. Details of this and its application to flexibility phenomena in myosin can be found in Garcia de la Torre (1989), Garcia de la Torre and Bloomfield (1977); Garcia de la Torre (1992), and Garcia de la Torre (1994). Finally, bead and bead-shell strategies have been developed based on shape alone, without the ambiguities caused by size (Garcia de la Torre et al., 1997). Intrinsic Viscosity
The simplest hydrodynamic conformation measurement is the intrinsic viscosity. The classical reference on the theory and practice of protein viscometry is an article by J.T. Yang (1961). A more recent effort has been written by the present author (Harding, 1997). The viscosity of an aqueous solvent will be increased by the addition of a macromolecular solute to an extent depending on (i) the concentration, (ii) the size (including the degree of hydration), and (iii) the shape. Increased concentration, size, and shape all increase the viscosity of a solution. Viscosity measurements on proteins in dilute solution are normally performed in a capillary (or "Ostwald") viscometer with the flow time under gravity (between two reference points) of the solution (t) compared to that of the solvent (t0), although differential microviscometers based on a pressure imbalance principle appear highly promising (Haney, 1985). With conventional capillary viscometers, pumping of liquid and timing is now usually done automatically, employing photodetectors (using, for example, a Schott-Gerate (Hofheim, Germany) system) and because viscosity is a sensitive function of temperature, a water bath is required with the temperature controlled and measured to within ± 0.005 °C. From the flow times (averaged over consistent measurements), the relative viscosity is r)r obtained from ri r =(t/t 0 Hp/p 0 )
(14)
with (p/po) the ratio of the solution to solvent density. This can be measured separately for each concentration using a precision density meter (Kratky et al., 1973; Rowe, 1978), but more conveniently this can be avoided if we use a kinematic
Protein Hydrodynamics
295
relative viscosity r]' r = (t/t0) and use a correction factor in the data analysis (see below) (Tanford, 1955). A (kinematic) reduced specific viscosity is then defined T
l' r e d =( T l' r -l)/c
(15)
so if c is in g/ml, Ti'red is in ml/g. To eliminate nonideality effects, T)'red is measured at a series of concentrations and extrapolated to zero concentration to yield the (kinematic) intrinsic viscosity [ r]' ] which can then be corrected for density to give the ("dynamic") intrinsic viscosity [ r\ ]. [ri] = {(l-Vp o )/p 0 } + [ri / ]
(16)
The shape parameter, known as the "viscosity increment" v (see for example, Tanford, 1961; Harding, 1995) is obtained from v = [T]]/vs
(17)
where vs(ml/g), the "swollen specific volume", is the volume of the "hydrated" protein per unit mass of dry protein and is related to the partial specific volume v by vs = v + (5/p0), where 5 (sometimes symbol "w") is known as the "hydration", the number of grams of solvent bound per gram of dry protein. Or, in terms of protein volume, V, vs = VNA/M where V (ml) is the (hydrated) volume of the protein and NA is Avogadro's number. Since v is the Einstein (1906, 1911) 2.5 value for spheres and since V = (4/3)7ir^ the hydrodynamic radius can be found thus providing an alternative procedure to dynamic light scattering for its measurement, v has also been evaluated for prolate and oblate ellipsoid models. Although the direct formulae are complicated (Harding, 1995), simple polynomial approximations that are accurate to ±1% are available (Harding and Colfen, 1995) and hence, provided a value for vs (or 8) is known or assumed, the axial ratio a/b can be found. The value typically taken for 8 for proteins is about 0.35 (= v s =l), although for unconjugated proteins it can vary by about ±100%, and for heavily glycosylated proteins such as those from mucus secretions, 8 can be as high as about 70 (Harding et al., 1983). (Caution has to be expressed when assigning a conformation from viscosity data alone.) For triaxial ellipsoids, evaluation of v merely specifies a line solution of possible values of (a/b, b/c) between the extremes of prolate ellipsoid (b/c = 1) and oblate ellipsoid (a/b=l) (Figure 7). Besides an assumption over 8, a further independent hydrodynamic measurement is necessary to provide a graphical intersection with the v-line to specify (a/b, b/c) directly. For the case of bead modelling, computer programs are available such as HYDRO (Garcia de la Torre et al., 1994) or the more recent size-independent SOLPRO algorithm (Garcia de la Torre et al., 1997) for predicting v (or [r]]) for a given specified set of coordinates for the beads; this procedure can thus be used for selecting which model gives the desired [r\] (after assuming a value for 8). Because of the uniqueness problems referred to above, for bead modelling the [r\] data
STEPHEN E. HARDING
296
Figure 7. Plots of constant values (i.e., "line solutions") for v and P as a function of the two triaxial ellipsoid axial ratios. Simulated data, for a hypothetical molecule of "real" (a/b, b/c) = (2.0, 2.0). Adapted from Harding and Rowe (1983). The intersection is supposed to give a unique value for (a/b; b/c), although this particular choice of shape functions gives too-shallow an intersection. cannot be used in isolation but has to be combined with other hydrodynamic measurements (e.g., sedimentation, diffusion, x-ray scattering, rotational diffusion, etc.). Sedimentation Velocity and Dynamic Light Scattering The principal conformation parameter to come out of both these measurements is known as the frictional ratio (f/f0). This is the ratio of the frictional coefficient of the protein to the frictional coefficient of a rigid spherical particle of the same anhydrous mass and volume. This can be related to either s®0 w or D 2 0 w by (f/f0) = (M(l - V P O )/N A .6TITI 0 S^ W ) (47t/VA/3vM) 1/3
(18)
or 47lNA
kaT
ro= 6TITI
0
3vM
^1/3
D{20,w
(19)
(see for example, Tanford, 1961; Harding, 1995) where, r\Q is the viscosity of water at 20.0 °C. In order to get shape information from equations 18 or 19, first of all a
Protein Hydrodynamics
297
function P (named in recognition of F. Perrin, who worked out the theory for the frictional coefficients for ellipsoids) is defined: P = (f/f o ).[(5/vp 2 0 w )+l]- 1 / 3
(20)
and then, similarly to viscometry, if 8 is known or assumed, P can be obtained. The "Perrin function" P is analogous to the viscosity increment v, and the axial ratio (a/b) for an ellipsoid of revolution can be found either by a rather complicated expression involving an elliptic integral or by simple polynomial expansions available for both prolate and oblate ellipsoids (Harding and Colfen, 1995). For general triaxial ellipsoids, as with v, there is a line solution of possible values for P (Figure 7). In principle, {a/b, b/c} can be found from the graphical intersection but as is clear from Figure 7, this is too shallow to cope with any data error. Other combinations involving these or other shape functions need to be employed. Use of Concentration Dependence Parameters, Combined Shape Functions, and the Radius of Gyration Rg
A simple way in principle to solving the hydration problem is to combine two shape functions together in such a way that the experimental requirement for 5 or v s is eliminated to give a combined "hydration-independent" shape function. The simplest of these is known as the P-function and comes from combination of equations 17 with 18 or 19 (see, for example, Tanford, 1961; Van Holde, 1985; Harding, 1995). This function is unfortunately highly insensitive to shape and of very limited use for conformation analysis; in fact it has found more use as a quasi-constant parameter for enabling M to be calculated from [r\] and S2 0w or DSJQ W (Yang, 1961). A more useful combination is [r\] with k s , the concentration dependence regression parameter from sedimentation velocity measurements (cf. equation 9), provided the sedimentation measurements have been made in a buffer of sufficient ionic strength, I, to suppress charge effects. To an approximation, (Rowe, 1992; Rowe, 1977) the ratio R={kJ[y)]}=2(l+?3)/v
(21)
Another is a combination of the second thermodynamic virial coefficient, B (from the concentration dependence of the apparent molar mass measurements using sedimentation equilibrium), with [r|] to define the hydration-independent shape, function n (Harding, 1981; 1995). n = {2BM/[ri]} -f(Z,I)/{[ri]M}
(22)
where the 2nd term on the RHS [a function of molecular charge or valency (Z) and ionic strength (I)] goes to zero if the ionic strength is sufficient (normally > 0.3M). As with v and P above, both R and n are available as simple polynomial expansions in terms of axial ratio a/b for ellipsoids of revolution (Harding and Colfen, 1995). They are also available as line solutions for {a/b, b/c} for triaxial ellipsoids and of
STEPHEN E. HARDING
298
course have the advantage over i^and P of not requiring an assumption concerning hydration for their measurement. Unfortunately, plotting R with n gives an equally poor intersection as that shown in Figure 7. A better combination is IT with the radius of gyration shape function G defined by (Harding, 1987) G = {(4TIN A )/(3VM} 2 / 3 -R^
(23)
R derives from a light scattering (or x-ray or neutron scattering) measurement, and if the surface (aq.) solvent on the protein is to a good approximation indistinguishable from surrounding solvent, and if the protein is not internally swollen through hydration, the specific volume term in equation 23 refers to the anhydrous protein (v ~ v) and no assumed value for the hydration is required. G also has a line solution for triaxial ellipsoids, but graphical combination of G with n does give a reasonable intersection and has been used to investigate the overall conformation of myosin in solution (Harding, 1987). Insofar as bead modelling is concerned, R (i.e., G) from x-ray and neutron scattering and si^ w (or P) have been used within the limitations referred to above with the earlier modelling program TRV (Garcia de la Torre, 1989) to distinguish plausible conformations for antibody models (Gregory et al., 1987) and has been used to show these molecules are clearly not coplanar as sometimes rather misleadingly depicted in textbooks. R with S20w has been used to select appropriate models for the complement system (Perkins, 1989) (see Figure 6) and a combination of D ^ w and s®0w used to model the self-assembly of T-even bacteriophages (Garcia de la Torre, 1989; Garcia de la Torre and Bloomfield, 1977). Rg combined with [r|] and electrooptic data has been used to model the flexibility of regions of myosin between the S2 head and low meromyosin (LMM) in terms of bending energies (Iniesta et al., 1989; Garcia de la Torre, 1989). Measurement and Use of Rotational Hydrodynamic Shape Functions: Fluorescence Depolarization Decay
A protein in solution will be subject to Brownian rotational forces. The ease or rate at which a protein rotates will depend on its size, shape, and hydration—in common with the three factors that also determine rate of translational diffusion. Therefore, if the size and hydration are known (or can be eliminated by combination with another measurement), then measurement of the rotational diffusion property can be used as another probe to measure shape. Although these measurements tend to be more difficult, the incentive is that the shape functions so derived are more sensitive functions of shape. The principal methods have been flourescence depolarization, electro-optics and, more recently, nuclear magnetic resonance (Garcia de la Torre et al., 1998). The most popular rotational diffusion probe is fluorescence depolarization (Weber, 1952; Van Holde, 1985). With the fluorescence depolarization method, fluorescent light emanating from a stimulated (by polarized light at the appropriate
Protein Hydrodynamics
299
wavelength) protein with a suitable fluorescent chromophore (either intrinsic— tryptophan, or synthetically attached) will be plane polarized. As the proteins rotate under rotational Brownian forces, the degree of polarization will decay at a rate dependent on the speed of rotation of the molecules. Detectors fitted with polarizers are used to measure the intensity of light parallel (I±) and perpendicular (I,,) to the incident pulse and the anisotropy measured A = ( I 1 - I | | ) / ( I 1 + 2I||) (24) In the "steady-state method", the protein solution is continuously irradiated and by making measurements of A in solutions at a variety of temperatures and viscosities (usually with the addition of glycerol) and with knowledge of the fluorescent lifetime of the chromophore, the harmonic mean relaxation time Th (units: sec.) can be measured from extrapolating a plot of \IA versus T/r\0 to T/r| 0 = 0 (Van Holde, 1985; Weber, 1952) r\0 is the solvent viscosity at temperature T. As with other hydrodynamic parameters, in principle, xh needs to be extrapolated to zero concentration to eliminate any possible contributions from nonideality effects. To obtain shape information from xh, a ratio {xh/xo} is defined (by analogy with the Perrin P function) where {xh/x0} = (kTTh)/(Tl0V)
(25)
and where the volume of the protein V = vsM/NA. To remove the requirement of knowledge of vs (i.e., hydration), {xh/xo} is combined with [r\] to produce the hydration-independent parameter A (Harding, 1980, 1995; Harding and Rowe, 1982a). A = v/{xh/x0} = (ri0[ri]M) /(NAkTxh)
(26)
As with the other shape functions referred to above, simple polynomial equations are available that relate A to the axial ratio of ellipsoids of revolution, and an example of its application to the globular protein neurophysin can be found in Rholam and Nicholas (1981). It is also available for triaxial ellipsoids, and a graphical combination of A with R can be used to obtain {a/b, b/c} uniquely (Harding and Rowe, 1982b). Indeed, this method has been used to confirm measurements previously made using the ellipsoid of revolution model (Rholam and Nicolas, 1981) that the dimerization of neurophysin clearly occurs through a side-by-side as opposed to an end-to-end process (Figure 8). These latter references also illustrate respectively the extraction of [n], ks, and xh (and hence R and A) for a dimerizing system. Some words of caution: although fluorescence depolarization, along with other rotational diffusion techniques, are particularly sensitive probes for conformation, it should be stressed that particularly for synthetically attached fluorescent chromophores, it must be established that there is no free rotation of the fluorescent chromophore with respect to the rest of the molecule; also for proteins containing more than one domain, segmental flexibility can obscure the shape measurement
STEPHEN E. HARDING
300
(a)
(b)
a/b
Figure 8. Triaxial ellipsoid gross conformation evaluations for (a) neurophysin monomers and (b) neurophysin dimers. Plots of constant values for A and R "line solutions" in the {a/b, b/c) plane. To perform these analyses, knowledge of three hydrodynamic parameters (for monomer and dimer) is required: [r\] (intrinsic viscosity), Th (from steady-state fluorescence depolarization), and ks (from sedimentation velocity). M o n o mers: {a/b, b/c} = (4,1); Dimers: {a/b, b/c} = (2.8, 2.5). Redrawn and adapted from Harding and Rowe (1982b).
(Johnson and Mihalyi, 1965); finally, in the steady-state method described above, the use of solvents of differing T and r\ must cause no significant conformation change. The harmonic mean itself is a mean over different rotational relaxation modes of the protein, each containing potential shape information. To resolve these requires a pulsed light source, time-resolved measurements, and mathematical algorithms for adequate deconvolution of the light source decay function and resolution of multiexponential terms, a by no means simple task (see, for example, Han et al., 1989; Livesey and Brochon, 1989). Electric birefringence (or dichroism) decay is, however, another attractive alternative to time-resolved fluorescence anisotropy decay measurements since, for a given isotropic monodisperse asymmetric scattered there are just two exponential to resolve (Ridgeway, 1966). A serious restriction of electrooptical methods, however, has been the restriction to solutions of low ionic strengths because of heating effects caused by the strong electric fields used.
Protein Hydrodynamics
301
However, a significant recent advance has been the design of an instrument with adequate shielding against such affects (Porschke and Obst, 1991) to permit the use of solvents at physiological ionic strengths. The application of electric birefringence methods to triaxial ellipsoid modelling can be found in Harding and Rowe (1983) and to bead modelling in Porschke and Antosiewicz (1989). Finally, nmr as a route for obtaining time-resolved rotational relaxation time appears highly promising (Garcia de la Torre et al., 1998). Some Computer Programs for Conformational Analysis
For ellipsoid modelling, the ELLIPS series of program for the PC (BASIC and FORTRAN) have been developed (Harding and Colfen, 1995; Harding et al., 1997). ELLIPS 1 evaluates the axial ratio a/b for prolate and oblate ellipsoids for a user-specified value for a hydrodynamic parameter and is based on polynomial approximations to the full hydrodynamic equations: accuracy of this approximation is normally well within the precision of the measurement. ELLIPS2 uses the full hydrodynamic equations for general triaxial ellipsoids to specify the set of hydrodynamic parameters for any given value of the axial ratios {a/b, b/c}. ELLIPS3 and ELLIPS4 do the reverse procedure using a variety of graphical combinations of hydration-independent triaxial shape functions (cf. Figures 3 and 4). Elsewhere, the routine SOLPRO (Garcia de la Torre et al., 1997, 1998) is particularly useful for the application of bead models.
REFERENCES Ackers, G. (1975). Molecular sieve methods of analysis. In: The Proteins, Third ed. (Neurath, H. and Hill, R.L., Eds.), p.l. Academic Press, New York. Andrews, P. (1965). Estimation of molecular weights of proteins by Sephadex gel filtration. Biochem. J. 91, 22. Antonsiewicz, J. and Porschke, D. (1989). An unusual electrooptical effect observed for DNA fragments and its apparent relation to a permanent electric moment associated with bent DNA. Biophys. Chem. 33, 19. Arner, E.C. and Kirkland, J.J. (1992). In: Analytical Ultracentrifligation in Biochemistry and Polymer Science. (Harding, S.E., Rowe, A.J., and Horton, J.C., Eds.), p. 209. Royal Society of Chemistry, Cambridge, England. Barth, H.G. (1980). A practical approach to steric exclusion chromatography of water-soluble polymers. J. Chromatog. Sci. 18, 409. Brown , W. (ed) (1993). Dynamic Light Scattering. The Method and Some Applications. Oxford University Press, Oxford. Burchard, W. (1992). Static and dynamic light scattering approaches to structure determination of biopolymers. In: Laser Light Scattering in Biochemistry (Harding, S.E., Sattelle, D.B., and Bloomfield, V.A., Eds.), p. 3-22. Royal Society of Chemistry, Cambridge, England. Claes, P., Dunford, M., Kennedy, A., and Vardy, P. (1992). An on-line dynamic light-scattering instrument for macromolecular characterization. In: Laser Light Scattering in Biochemistry. (Harding, S.E., Sattelle, D.B., and Bloomfield, V.A., Eds.), p. 66-76. Royal Society of Chemistry, Cambridge, England.
302
STEPHEN E. HARDING
Colfen, H. and Harding, S.E. (1997). MSTARA and MSTARI: Interactive PC algorithms for simple, model-independent evaluation of sedimentation equilibrium data. Eur. Biophys. J. 25, 333-346. Colfen, H., Harding, S.E., Wilson, E.K., Scrutton, N.S., and Winzor, D.J. (1997). Low temperature solution behaviour of methylophilus methylotrophus electron-transferring flavoprotein: A study by analytical ultracentrifugation. Eur. Biophys. J. 25, 411-416. Correia, J.J. and Yphantis, D.A. (1992). Equilibrium sedimentation in short solution columns. In: Analytical Ultracentrifugation in Biochemistry and Polymer Science. (Harding, S.E., Rowe, A.J., and Horton, J.C., Eds.), p. 231-252. Royal Society of Chemistry, Cambridge, England. Creeth, J.M. and Harding, S.E. (1982). Some observations on a new type of point average molecular weight. J. Biochem. Biophys. Meth. 7, 25-34. Darby, N., Harding, S.E., and Creighton, T.E. (1997). (In Press.) Dubin, PL. and Principi, J.M. (1989). No previously suggested dimensional parameter controls peak migration in size exclusion chromatography. Div. Polym. Chem., Am. Chem. Soc. Preprints, 30, 400-401. Einstein, A. (1906). Ann. Physik. 19, 289-305; and corrigenda (1911) 34, 591-592. Garcia de la Torre, J. (1989). Hydrodynamic properties of macromolecular assemblies. In: Dynamic Properties of Biomolecular Assemblies. (Harding, S.E. and Rowe, A.J., Eds.), pp. 3-31, Royal Society of Chemistry, Cambridge, England. Garcia de la Torre, J. (1992). Sedimentation coefficients of complex biological particles. In: Analytical Ultracentrifugation in Biochemistry and Polymer Science. (Harding, S.E., Rowe, A.J., and Horton, J.C., Eds.), p. 333-345. Royal Sociey of Chemistry, Cambridge, London. Garcia de la Torre, J. (1994). Hydrodynamics of segmentally flexible macromolecules. Eur. Biophys. J. 23, 307-322. Garcia de la Torre, J., and Bloomfield, V.A. (1977). Hydrodynamics of macromelecular complexes 3. Bacterial Viruses. Biopolymers 16, 1779-1793. Garcia de la Torre, J., Carrasco, B., and Harding, S.E. (1997). SOLPRO: Theory and computer program for the prediction of SOLution PROperties ofrigidmacromolecules and bioparticles. Eur. Biophys. J. 25,361-372. Garcia de la Torre, J., Harding, S.E., and Carrasco, B. (1998). Calculation of NMR relaxation, covolume, and scattering-related properties of bead models using the SOLPRO computer program. Eur. Biophys. J. (in press). Garcia de la Torre, J., Navarro, S., Lopez Martinez, M.C., Diaz, F.G., and Lopez Cascales, J.J. (1994) HYDRO: A computer program for the prediction of hydrodynamic properties of macromolecules. Biophys. J. 67, 530-531. Gregory, L., Davis, K.G., Sheth, B., Boyd, J., Jefferis, R., Nave, C. and Burton, D.R. (1987) The solution conformations of the subclasses of human IgG deduced from sedimentation and small angle X-ray scattering studies. J. Mol. Immunol. 24, 821-830. Han, M.K., Knutson, J.R., and Brand, L. (1989). Fluorescence studies of protein-subunit interactions. In: Dynamic Properties of Biomolecular Assemblies. (Harding, S.E. and Rowe, A.J., Eds.), p. 115-134. Royal Society of Chemistry, Cambridge, London. Haney, M.A. (1985). A differential viscometer. American Laboratory 17,41-56. Harding, S.E. (1980). The combination of the viscosity increment with the harmonic mean rotational relaxation time for determining the conformation of biological macromolecules in solution. Biochem. J. 189,359-361. Harding, S.E. (1981). A compound hydrodynamic shape function derived from viscosity and molecular covolume measurements. Int. J. Biol. Macromol. 3, 340-341. Harding, S.E. (1986). Applications of light scattering in microbiology. Biotech. Appl. Biochem. 8, 489-509. Harding, S.E. (1987). A general method for modeling macromolecular shape in solution—a graphical (11-G) intersection procedure for triaxial ellipsoids. Biophys. J. 51, 673-680.
Protein Hydrodynamics
303
Harding, S.E. (1995). On the hydrodynamic analysis of macromolecular conformation. Biophys. Chem. 55, 69-93. Harding, S.E. (1997). The intrinsic viscosity of biological macromolecules. Progress in measurement, interpretation, and application to structure in dilute solution. Prog. Biophys. Mol. Biol. 68, 207-262. Harding, S.E. and Colfen, H. (1995). Inversion formulae for ellipsoid of revolution macromolecular shape functions. Analyt. Biochem. 228, 131-142. Harding, S.E. and Johnson, P. (1985). Physicochemical studies on turnip-yellow-mosaic virus. Homogeneity, relative molecular masses, hydrodynamic radii and concentration-dependence of parameters in nondissociating solvents. Biochem. J. 231, 549-555. Harding, S.E. and Rowe, A.J. (1982a).Modeling biological macromolecules in solution. 1. The ellipsoid of revolution. Int. J. Biol. Macromol. 4, 160-164. Int. J. Biol. Macromol. 4, 160-164. Harding, S.E. and Rowe, A.J. (1982b). Modeling biological macromolecules in solution. 3. The lambda-R intersection method for triaxial ellipsoids. Int. J. Biol. Macromol. 4, 357-361. Harding, S.E. and Rowe, A.J. (1983). Modeling biological macromolecules in solution. 2. The general triaxial ellipsoid. Biopolymers 22, 1813-1829. Harding, S.E. and Rowe, A.J. (1984). Modeling biological macromolecules in solution. 2. The general triaxial ellipsoid. Biopolymers 23, 843. Harding, S.E. and Rowe, A.J., and Creeth, J.M. (1983). Further evidence for a flexible and highly expanded spheroidal model for mucus glycoproteins in solution. Biochem. J. 209, 893-896. Harding, S.E., Rowe, A.J., and Horton, J.C. (Eds.)( 1992a). Analytical ultracentrifugation in biochemistry and polymer science. Royal Society of Chemistry, Cambridge, England. Harding, S.E., Horton, J.C, and Morgan, P.J. (1992b). MSTAR: A FORTRAN program for the model-independent molecular weight analysis of macromolecules using low speed of high speed sedimentation analysis. In: Analytical Ultracentrifugation in Biochemistry and Polymer Science. (Harding, S.E., Rowe, A.J., and Horton, J.C, Eds.), pp. 275-294. Royal Society of Chemistry, Cambridge, England. Harding, S.E., Horton, J.C, and Colfen (1997). The ELLIPS suite of macromolecular conformation algorithms. Eur. Biophys. J. 25, 347-359. Harding, S.E., Horton, J.C, and Johnson, P. (1997). (To be published.) Iniesta, A., Diaz, F.G., and Garcia de la Torre, J. (1989). Transport properties of rigid bent-rod macromolecules and of semi-flexible broken rods in the rigid-body treatment: Analysis of the flexibility of of myosin rod. Biophys, J. 54, 269-276. Johnsen, R.M. and Brown, W. (1992). An overview of current methods of analysing QLS data. In: Laser Light Scattering in Biochemistry. (Harding, S.E., Sattelle, D.B., and Bloomfield, V.A., Eds.), p. 77-91. Royal Society of Chemistry, Cambridge, England. Johnson, P. (1984). Light-scattering and correlation-measurement. Biochem. Soc. Trans. 12, 623-625. Johnson, P. and Mihalyi, E. (1965). Physicochemical studies of bovine fibrinogen. 2. Depolarization of fluorescence studies. Biochim. Biophys. Acta. 102, 476-486. Jumel, K., Fiebrig, I, and Harding, S.E. (1996). Rapid size distribution and purity analysis of gastric mucus glycoproteins by size exclusion chromatography/multiangle laser light scattering. Int. J. Biol. Macromol. 18, 133-139. Kratky, O., Leopold, H., and Stabinger, H. (1973). The determination of the partial specific volume of proteins by mechanical oscillator technique. Meth. Enzymol. 27D, 98-110. Langley, K.H. (1992). Developments in electrophoretic laser light scattering and some biochemical applications. In: Laser Light Scattering in Biochemistry (Harding, S.E., Sattelle, D.B., and Bloomfield, V.A., Eds.), p. 151-160. Royal Society of Chemistry, Cambridge, England. Livesey, A.K. and Brochon, J. (1989). Maximum entropy data analysis of dynamic parameters from pulsed-fluorescent decays. In: Dynamic Properties of Biomolecular Assemblies. (Harding, S.E. and Rowe, A.J., Eds.), p.135. Royal Society of Chemistry, Cambridge, England.
304
STEPHEN E. HARDING
Perkins, S.J. (1986). Protein volumes and hydration effects: The calculations of partial specific volumes, neutron-scattering match points and 280-nm absorption coefficients for proteins and glycoproteins from amino-acid sequences. Eur. J. Biochem. 157, 169-180. Perkins, S.J. (1989). Hydrodynamic modelling of complement. In: Dynamic Properties of Biomolecular Assemblies (Harding, S.E. and Rowe, A.J., Eds.), p. 226-245. Royal Society of Chemistry, Cambridge, England. Perkins, S.J. (1994). In: Microscopy, Optical Spectroscopy and Macroscopic Techniques. (Jones, C , Mulloy, B., and Thomas, A.H., Eds.), Vol. 22, p. 39. Humana Press, NJ. Philo, J. (1994). Measuring sedimentation, diffusion, and molecular weights of small molecules by direct fitting of sedimentation velocity profiles. In: Modern Analytical Ultracentrifugation. (Schuster, T.M., and Laue, T.M., Eds.), p. 156-170. Birkhauser, Boston. Porschke, D. and Antonsiewicz, J. (1989). Analysis of macromolecular structures in solution by electrooptical procedures. In: Dynamic Properties ofBiomolecular Assemblies. (Harding, S.E. and Rowe, A.J., Eds.), p. 103-114. Royal Society of Chemistry, Cambridge, England. Porschke, D. and Obst, A. (1991). An electric field jump apparatus with ns time resolution for electrooptical measurements at physiological salt concentrations. Rev. Sci. Instrum. 62, 818-820. Pusey, P.N. (1974). Macromolecular diffusion. In: Photon Correlation and Light Beating Spectroscopy. (Cummins, H.Z. and Pike, E.R., Eds.), p. 387-428, Plenum, New York. Rholam, M. and Nicolas, P. (1981). Side-by-side dimerization of neurophysin: Sedimentation-velocity, viscometry, andfluorescencepolarization studies. Biochemistry 20, 5837-5843. Ridgeway, D. (1966). Transient electric birefringence of suspensions of asymmetric ellipsoids. J. Am. Chem. Soc. 88, 1104-1112. Roark, D. and Yphantis, D.A. (1969). Studies of self-associating systems by equilibrium ultracentrifugation. Ann. N.Y. Acad. Sci. 164, 245-278. Rowe, A.J. (1977). Concentration-dependence of transport processes—general description applicable to sedimentation, translational diffusion, and viscosity coefficients of macromolecular solutes. Biopolymers 16, 2595-2611. Rowe, A.J. (1978). Techniques for determining molecular weight. Techn. Life Sci.: Biochem. B105a, 1-31. Rowe, A.J. (1992). The concentration-dependence of sedimentation. In: Analytical Ultracentrifugation in Biochemistry and Polymer Science. (Harding, S.E., Rowe, A.J., and Horton, J.C., Eds.), p. 394-406. Royal Society of Chemistry, Cambridge, England. Sanders, A.H. and Cannell, D.S. (1980). In: Light Scattering in Liquids and Macromolecular Solutions. (Degiorgio, V., Corti, M., and Giglio, M., Eds.), p. 173, Plenum, New York. Schachman, H.K. (1989). Analytical ultracentrifugation reborn. Nature 341, 259-260. Schachman, H.K., Pauza, CD., Navre, M., Karela, M.J., Wu, L., and Yang, YR. (1984). Location of amino-acid alterations in mutants of aspartate transcarbamylase-structural aspects of interallelic complementation. Proc. Natl. Acad. Sci. USA, 81,115-119. Schmitz, K.S. (1990). An Introduction to Dynamic Light Scattering by Macromolecules. Academic Press, New York. Schuster, T.M. and Laue, T.M. (Eds.)(1994). Modern Analytical Ultracentrifugation, Birkhauser, Boston. Stafford, W.F. (1992). Methods for obtaining sedimentation coefficient distributions. In: Analytical Ultracentrifugation in Biochemistry and Polymer Science. (Harding, S.E., Rowe, A.J., and Horton, J.C., Eds.) p. 359-393. Royal Society of Chemistry, Cambridge, England. Stepdnek, P. (1993). Data analysis in dynamic light scattering. In: Dynamic Light Scattering. The Method and Some Applications. (Brown, W., Ed.) Oxford University Press, Oxford, England. Svedberg, T. and Pedersen, K.O. (1940). The Ultracentrifuge. Oxford University Press, Oxford, England. Tanford, C. (1955). Intrinsic viscosity and kinematic viscosity. J. Phys. Chem. 59, 798-799. Tanford, C. (1961). Physical Chemistry of Macromolecules. J. Wiley & Sons, New York.
Protein Hydrodynamics
305
Teller, D.C. (1973). Characterization of proteins by sedimentation equilibrium in the ultracentrifuge. Meth. Enzymol. 27D, 346-441. Tombs, M.P. and Peacocke, A.R. (1974). The Osmotic Pressure of Biological Macromolecules, Oxford University Press, Oxford. Van Holde, K.E. (1971). Physical Biochemistry. First ed. Prentice Hall, Englewood Cliffs, New Jersey. Van Holde, K.E. (1985). Physical Biochemistry. Second ed. Prentice Hall, Englewood Cliffs, New Jersey. Weber, G. (1952). Polarization of the fluorescence of macromolecules. I. Theory and experimental method. Biochem. J. 51, 145-155. Weber, G. (1952). Polarization of the fluorescence of macromolecules. II. Fluorescent conjugates of ovalbumin and bovine serum albumin. Biochem. J. 51, 155-167. Wells, C, Molina-Garcia, A.D., Harding, S.E. and Rowe, A.J. (1990). Self-interaction of dynein from Tetrahymena cilia. J. Mus. Res. Cell Motil. 11, 344-350. Wyatt, P.J. (1992). In: Laser Light Scattering in Biochemistry. (Harding, S.E., Sattelle, D.B., Boomfield, V.A., Eds.), p. 35-58, Royal Society of Chemistry, Cambridge, England. Yang, J.T. (1961). Viscosity of macromolecules in relation to molecular conformation. Adv. Prot. Chem. 16, 323-400.
This Page Intentionally Left Blank
INDEX
Acetylcholinesterase, 89 Aconitase, 116 Acyl-coenzyme A binding protein, 169, 187 O-acylisourea, 36 Aggregation, 6, 155, 156, 190, 201, 220, 222, 240, 256, 258, 263, 281 Alcohol dehydrogenase, 126, 134 Aldolase, 201 Alkaline phosphatase, 124, 126, 129, 133 Amino acid analysis, 35 Ammonium sulphate, 4 Anomalous data, 9, 14 Antibody 5, 292, 293, 298 Apomyoglobin, 84, 173, 188, 255 Arrhenius diagrams, 180 Arc repressor, 197-198 Arsenite ion, 43 Aspartate transcarbamylase, 68,74,76, 89, 272 Aspartokinase-homoserine dehydrogenase, 200 Autocorrelator, 280 Azurin, 121 Bacteriorhodopsin, 77, 85, 264 Barnase76, 162, 174, 175, 184-185 Barstar, 162
Biological Macromolecule Crystallization Database (BMCD), 4 Bohr effect, 62 Boltzmann probability formula, 227-229 Borate, 34, 35 BPTI, (see trypsin inhibitor) Bragg's law, 8 Bromoacetamide, 39 Bromoacetic acid, 39 /V-Bromosuccinimide, 29, 46 Brookhaven Protein Databank, 2 Butanedione, 34, 35 Calbindin, 161 Calibrated gel chromatography, 275 Calorimetry, 157, 165, 191,238-243, 248,251,254-258,264 Cambridge Crystallographic Database, 129, 146 Cambridge Structural Database, 107-108, 114 Carbonic anhydrase, 118, 120, 135137, 139, 140 Chaperones, 155, 156, 190, 219 Charged coupled devices (CCD), 10, 19-20 Chemistry of protein functional groups, 23-59 abstract, 24 307
308
amino groups (oc-NH2 and lysine), modification of, 29-33 amidination, 31 amino-termini, selective transamination of, 33 citraconic anhydride, 31-32 maleic anhydride, 31-32 pyridoxal phosphate, 30 reducing agents, four, 30 reductive methylation, 29-30 selective modifications of a- and e-amino groups, 32-33 tribitrobenzenesulfonate, 32 carboxamide groups (asparagine and glutamine), modification of, 37-38 deamidation, 37-38 carboxyl groups (a-COOH, aspartate and glutamate), modification of, 36-37 carbodiimides, water-soluble, and glycine ethyl ester, 36-37 disulfide bonds (cystine), modification of, 44-45 dithiothreitol and other thiols, reduction by, 44-45 guanidino groups (arginine), modification of, 34-36 butanedione, 34-35 phenylglyoxal, 34, 35-36 imidazole groups (histidine), modification of, 33-34 diethyl pyrocarbonate, 33-34 indole groups (tryptophan), modification of, 46 jV-bromosuccinimide, 46 introduction, 24-29 amino acid residues, 24, 25 functional groups, 24 modification, chemical, purposes of, 25 reagents and procedures for chemical modification, 26-28 side chains, 24
INDEX
stabilities of modification procedure products, 29 sulfhydryl groups, 29 phenolic groups, (tyrosine), modification of, 46-49 jV-acetylimidazole, 49 iodination, 46-48 and radioactive iodine, 47 tetranitromethane, 48-49 sulfhydryl groups (cysteine), modification of, 38-44, 46 jV-alkylmaleimides. 39 aresenite iron, 43 dipyridyl disulfide, 42-43 dithio(2-nitrobenzoate), 39-42 jY-ethylmaleimide, 38-39 methyl methanethiosulfonate, 39 phenylarsine oxide, 44 vicinal groups, selective reactions with, 43-44 thioether groups (methionine), modification of, 45-46 chloramine T, 45-46 hydrogen peroxide, 45 chloramine T, 45, 75 yV-chlorosuccinimide, 46, 47 Chymotrypsin, 111, 240 Chymotrypsin inhibitor CI2, 162, 165, 170, 181, 184, 186, 187 Citraconic anhydride, 31 Cold shock protein CspB, 166-169, 170, 180-181, 182, 183, 184, 187 Colicin, 90 Collagen, 164,220,263-264 Collagenase, 121 COMBINE, 19 Concanavalin A, 121, 123 Conformational change, 5 CONTIN, 282 Coulomb's Law, 79, 142 Cryocooling, 8 Crystal classes, 7 Crystallin, 194-195,202
Index
Crystallography of proteins, 1-22 (see Protein) Cyanide, 41 Cyclohexanedione, 34, 36 Cystamine, 40 Cytochrome c, 162,175,176,177,187188,240 Cyclosporin A (CsA), 163-164 Cytoplasmic cyclophilinl8, 163-164 DAMPS, 68 DelPhi, 81 DENZO, 10 Dicyclohexylcarbodiimide, 36 Diethylpyrocarbonate, 29, 33, 34 Differential scanning calorimetry (DSC), 238-240 Diffractometers, 9 Dihydrofolate reductase, 86, 164, 184 Dimethylamine borane, 30 Dimethylsuberimidate, 31 Dipyridyldisulphide, 42, 43 5,5'-Dithio(2-nitrobenzoic acid), 38-43 Dithiothreitol (DTT), 43-45, 257 DNA polymerase I, 123, 124 Domain, 190-196, 199, 201, 204, 240 DSC, (see calorimetry) Dynamic light scattering (DLS), 277-282 CONTIN, 282 "Dynamic Zimm plot," 281 fixed-angle (90°) DLS photometer, 277-278,280-281 MHKS relation, 280 limitations, 278 as method of choice, 277 multiangle instruments, 281-282 cuvet clarity critical, 282 Polydispersity Factor, 282 process, 278-280 PROTEPS, 282 vs. static light scattering, 277 EF-hand structure, 113
309 Electro-optics, 298 Electron density, 3, 11, 14, 19, 104 Electrophoretic methods, 273 Electrostatic effects in proteins, 61-97 experimental approaches, 69-74 electrophoresis, 70-71 ion exchange chromatography, 69-70,71 isoelectric focusing. 72-73 linkage theory, 74 nuclear magnetic resonance (NMR)spectroscopy, use of, 74 site titrations, individual, 74 experimental studies, examples of, 74-78 catalytic mechanisms, 74-75 lipids, interactions with, 78 long-range interactions, 75-76 protein charge ladders, 77-78 single-site titrations, 77 site-directed mutagenesis, 76-77 surface charge, determining, 77-78 introduction, 62-64 computer models, use of, 63 site-directed mutagenesis, 63 principles, basic, 64-69 amino acids in proteins, seven with ionizable side chains, 64-66 dielectric constant, 67-68 dipoles in proteins, 67 ionizable groups in proteins, 64-67 ionization and pKa values, 64 isoelectric point (pi), 67 isoionic point, 67 pKa values of ionizable groups, factors influencing, 68-69 potential surfaces, 68, 69 self-energy terms, 69 theoretical approaches, 78-84 boundary element method, 81
310
DelPhi, 81 finite difference method, 81, 89, 91 finite element method, 81,91 free energies and pKa values, 82-83 GRASP, 82 grid spacing, accuracy dependent on, 81,89 Henderson-Hasselbach equation, 83 macroscopic(continuum) vs. microscopic models, 78-79 multigrid method, 81 Poisson-Boltzmann equation, solutions of, 79-82 Tanford-Kirk wood model, 79-81, 85 Tanford-Roxby approach, 83-84 titration curves, calculating, 83-84 theoretical studies, examples of, 84-92 colicins, calculations on, 90 enzyme mechanisms, 89 free energies of ligand binding, 86-88 heteropolymer collapse, 84 hormone-receptor interactions, 91 ligand binding, kinetics of, 88-89 Linderstrom-Lang model, 84 lipids, interactions with, 90-91 phospholipases A2, calculations on, 90-91 prospects, future, 91-92 protein stability, 84-85 pKa values, estimating, 85-86 redox potentials, 90 Electrostatic forces, 19, 62-92, 100, 224,245,251-253 ELLIPS series of PC programs, 301 Enthalpy, 227-232, 253 (see also Thermodynamics) and ligand binding, 248-250
INDEX
"Enthalpy-entropy compensation" phenomenon, 182, 232 Entropy, 228-232, 253 (see also Thermodynamics) and ligand binding, 248-251 Equipartition theorem, 233 Ethyl acetimidate, 31 l-Ethyl-3-(3'-Af,A^dimethylaminopropyl)carbodiimide, 36 Ethylenediamine, 36 N-Ethylmaleimide, 26, 38, 39 Exon, 191 Eyring equation, 182 Fibrinogen, 277, 278 Fluorescence depolarization, 298-301 Formaldehyde, 30, 121 Freidel law, 15 Frictional ratio, 296-297 FRODO, 18, 141 G-protein, 140 Gel permeation chromatography, 274-277 Gibbs Free Energy, 228, 239, 248 (see also Thermodynamics) Glutathione, 44, 188 Glycine ethyl ester, 36 GRASP 82 GRID, 141 Harker diagram, 12 Heisenberg uncertainty principle, 224 Helix dipole, 67 Hemerythrin, 116, 131, 135 Henderson-Hasselbach equation, 83 High-performance liquid chromatography (HPLC), 274 HIV integrase, 6 HYDRO, 295 Hydrodynamics, protein, 271-305 abstract, 272 hydrodynamic techniques, 273 electrophoretic methods, 273
Index
introduction, 272-273 mass analysis, 273 rapid and nondestructive, 272 shape measurement, 273 X-ray crystallography and NMR, as support to, 272 molar mass (molecular weight) and quaternary structure, 273-291 calibrated gel chromatography, 275-277 dynamic light scattering (DLS), 277-282 (see also Dynamic light) gel filtration and size exclusion chromatography, 274-277 gel permeation chromatography, 274-277 MHKS relation, 280 Polydispersity Factor, 282 refractometers, 275 SEC/MALLS, 277 sedimentation equilibrium, 286-291 (see also Sedimentation) sedimentation velocity in analytical ultracentrifuge, 282-286 (see also Sedimentation) size exclusion chromatography, 274-277 static light scattering, 277 for unglycosylated polypeptide, 273 shape measurement, 291-301 combined shape functions, 297 concentration dependence parameters, use of, 297-298 conformational analysis, computer programs for, 301 ELLIPS series, 301 fluorescence depolarization decay, 298-301 frictional ratio, 296-297 HYDRO, 295
311
intrinsic viscosity, 294-296 "low-resolution information," advantages of, 291-292 Ostwald viscometer, 294 Perrin function, 297, 299 radius of gyration Rg, 297-298 rotational hydrodynamic shape functions, measurement and use of, 298-301 sedimentation velocity and dynamic light scattering, 296-297 SOLPRO, 295, 301 Stokes sphere, 292 strategies: spheres, ellipsoids, beads, and bends, 292-294 TRV modeling program, 298 uniqueness problem, 293-294 Hydrogen bonding, 100, 102, 141, 172, 220, 223, 224-225, 234-235, 244-246, 263, 264 Hydrogen peroxide, 45, 48 Hydrophobic interactions, 56, 142, 146, 172, 182,201,223,225, 243-246, 264 Hydroxylamine, 29, 33, 34, 49 /V-Hydroxysuccinimide, 37 yV-Hydroxysulfosuccinimide, 37 Image plates, 10 Immunoglobulin, 162, 192-194,201 Integrins, 120 Internal energy, 227 (see also Thermodynamics) Iodination, 46-48 Iodoacetamide, 39 Iodoacetic acid, 39, 188 Iodoacetic anhydride, 33 Ion exchange chromatography, 69-70, 71 (see also Electrostatic) Ions, binding of to proteins, 9-152 abstract, 99-100 anion binding to protein functional groups, 129-141
312
in alkaline phosphatase, 129-130, 133 aluminum fluoride, 140 carbonic anhydrase, 135, 137, 139 carboxylate site, 139 fluorine and fluorides, 141 hydrogen bonds, number of, 129, 132 cation binding in proteins, examples of, 118-126 alkaline phosphatase, 124 azurin, 121 carbonic anhydrase, 118-120 concanavalin A, 121 -123 enzymes binding two different metal ions, 121-123 Klenow fragment, 123-124 magnesium-binding site, 124-125 mandelate racemase, 118-119 methanol dehydrogenase, 121, 122 pyrrolo-quinoline (PQQ) group of methanol dehydrogenase, 121 zinc, 118, 120, 121, 124-125 introduction, 100-105 amino-acid side chains, 101 conditions for, 100 displacement parameter, 104-105 electron-density values, 104-105 negatively charged groups, 102 positively charged ions, 102 proteins and ligands, three types of interactions between, 100-101 resolution of structure determination, 103-104 X-ray diffraction studies as source, 102-103 ion migration in proteins containing more than one metal, 126-128 D-xylose isomerase, 126-128 metal ion binding to protein functional groups, 105-118
INDEX
ammonium ions, 118 in body, 106 calcium ions, 106, 113 Cambridge Structural Database, 107-108, 114 carboxyl groups of aspartic and glutamic acids, 107-109 copper ions, 114-115 "Ef hand" structure, 113, 114 enzymes binding metal ions, two, 107 histadine side chains, 110, 113 iron ions, 116-118 Irving-Williams series, 107 Jahn-Teller effect, 115 magnesium ions, 106, 112-113 potassium ions, 106 rubidium ions, 118 selectivity of binding sites, 107 sodium ions, 106 "softer" metal ions, 107 which metal ions likely to bind with which group, 112 zinc ions, 112, 113-115, 118 prediction of ion-binding sites, methods of, 141-147 Cambridge Crystallographic Database, 146 cation-to-anion radius ratio, 143, 144 "composite (average) crystal-field environments," 146 Coulomb's Law, 142 D-xylose isomerase, 141 FRODO, 141 GRID, 141 hydrophilic groups within "shell" of hydrophobic Groups, 142 Lennard-Jones, 141 probes, 145 Irving-Williams series, 107 Isoaspartate residues, 37 Iso-1-ferricytochrome c, 77
Index
313
Isoelectric point (pi), 67 Isoionic point, 67
Myoglobin, 85, 172, 173, 240 Myosin, 264, 277, 278, 294, 298
Jahn-Teller effect, 115
NMR, (see Nuclear magnetic resonance spectroscopy) Neutron diffraction, 104 2-Nitro-5-thiocyanobenzoate, 41 Nitrous acid, 33 Nuclear magnetic resonance (NMR)spectroscopy, 2, 63, 74,77,85,102,169,171,173, 174, 188,261,262,272,287, 291,298 Nucleation, 3
Kinetic traps, 155 oc-Lactalbumin, 172, 174, 250, 255, 261-262 (3-Lactamase, 163 Lactoperoxidase, 48 Lambert-Beer law, 287, 288 Laue method, 9 LeChatelier's principle, 225-226, 247 Lennard-Jones, 141 Luciferase, 197 Lysozyme, 84, 85, 129, 132, 162, 174, 175, 180,240,243,249,251, 252, 256-257, 260-263, 278
Octopine dehydrogenase, 199-200 n-Octyl glucoside, 5 Oscillation camera, 9 Ostwald viscometer, 294
Maleic anhydride, 31 Mannitol-1 -phosphate dehydrogenase, 201 Mellitin, 78 Mercaptoacetic acid, 45 P-Mercaptoethanol, 39, 41, 44, 45, 277 Metal ions, binding of to proteins, 105-118 (see also Ions) Methane monooxygenase, 131, 135 Methanol dehydrogenase, 121, 122 Methionine repressor, 257-258 Methionine sulfoxide, 45 Methyl acetimidate, 31 jV-Methylmercaptoacetamide, 45 Methyl methanethiosulfonate, 39, 43 MHKS relation, 280 Microgravity, 5 Miller indices, 7, 10 Mitochondrial proteins, 164 Molecular dynamics, 18,74,83,84,87, 91-92 "Molten globule", 84, 171-174, 187, 218, 222, 255-256, 262 (5**? also Protein folding)
Patterson function, 11,12 Periodate, 33 Perrin function, 297, 299 Phase problem, 11, 12 Phenylarsine oxide, 43, 44 Phenyl glyoxal, 34, 35 6-Phosphoglucanate dehydrogenase, 129, 130 Phosphoglycerate kinase, 259 Phospholipase A2, 89, 90-91 Photon detection, 9 Poisson-Boltzmann equation, 79-86 (see also Electrostatic) Posttranslational modifications, 6, 25 Procaricain, 76 Prolyl isomerase, 163, 165 Prolyl peptide bond isomerization in protein folding, 161-164 (see also Protein folding) Protease, 5, 199 Protein crystallography, 1-22 abstract, 2 crystallization, 3-7 agent, three types of, 4
4
aggregation, 6 batch method of, 6 Biological Macromolecule Crystallization Database (BCMD), 4 detergents, nonionic, use of, 5 entropically unfavorable, 3 Fab fragment of antibody, addition of, 5 factors affecting, 4-6 growth, cessation of, 3 growth phase, 3 "hanging drop" vapor diffusion, 6 HIV integrase, 6 least well understood, 3 metal ions, addition of, 5 method types, three, 5, 6-7 nucleation, 3 process, 3-4 protein engineering, 6 protein ligand, presence of, 5 seeding methods, use of, 5 "sitting drops" vapor diffusion, 6 sparse-matrix approach, 7 stages, three, 3 statistical analysis, 7 "streak" seeding, 5 vapor diffusion, 6 variants within protein, 5-6 data collection, crystallographic, 8-10 advantages, three, of X-rays from synchrotrons, 9 charged coupled devices (CCD), 10, 19-20 data reduction and processing, 10 DENZO, 10 detection of diffracted X-rays, 9-10 flat detectors, 10 position-sensitive photon detection, 9-10 screenless oscillation photography, 9
screenless precession, 9 synchrotron sources, 8-9 white radiation, 9 XDS, 10 Xengen, 10 diffraction of X-rays, 7-8 detection, 9-10 Bragg's law, 8 Miller indices, 7, 10 final model and validity of structure, 19-20 COMBINE, 19 omit maps, calculating, 19 SIGMAA, 19 XPLOR, 19 introduction, 2-3 Brookhaven Protein Databank, 2 crystals, suitable, growth of as prerequisite, 2 NMR methods, 2 technological developments, 2 map improvements, 16-17 density modification, 16 maximum entropy techniques, 16-17 SOLOMON, 16 solvent flattening, 16 SQUASH, 16 phase determination, methods of, 10-16 anomalous dispersion/ scattering, 14-16 Freidel law, 15 molecular replacement, 11 multiple isomorphous replacement, 12-14 Patterson map/function, 11-12 phase problem, 11 structure refinement, 17-19 automatic refinement program, 19 FRODO, 18 gradient descent, 18 molecular dynamics, 18 Protein disulphide isomerase, 188, 290
Index
Protein engineering, 6, 128, 184-186 Protein folding, 73, 153-215 abstract, 154 concluding remarks, 203-204 cooperativity of, 156-160 dissociation reaction, 157 kinetics of two-state folding transitions, 157-160 two-state approximation, 156-157 disulfide-containing proteins, folding of, 188-189 bovine pancreatic trypsin inhibitor (BPTI), example of, 188-189 folding intermediates, 170-176 apo-oc-lactalbumin, 172 apo-myoglobin, 173 characterization of, difficult, 171 equilibrium intermediates: "molten globule," 171-174 kinetic intermediates, relation of with equilibrium intermediates, 173-174 kinetic models, 170 "molten globule," 171-174 protein engineering, characterization of kinetic folding intermediates by, 174 Roder's model, 175-76 role of for folding, 175-176 rollover in folding rate, 175-176 staphylococcal nuclease, 172, 173 ubiquitin, 175-176, 186 introduction, 154-156 in amino acid sequence, 154 chaperone-mediated folding, 155, 156 kinetic consistency principle, 156 "second half of genetic code," 154 speed of folding, 154-155 thermodynamic vs. kinetic control, 155-156, 157-160 large monomeric and oligomeric proteins, folding of, 199-203
315
monomeric, 199-202 monomeric, speculations about, 201-202 octopine dehydrogenase (ODH), 199-200 oligomeric, 202-203 tryptophan synthase, 200-201 te/(temperature-sensitive for folding) mutations, 203 oligomeric proteins, folding and association of, 197-198 arc repressor dimers, 197-198 large, 202-203 prolyl peptide bond isomerizations as source for complex kinetics, 160-164 cytoplasmic cyclophilin 18, 163-164 fast- and slow-folding reactions, 160-161 FK506, 164 isomerization of peptide bonds not preceding proline, 163 parvulins, 164 prolyl isomerase, catalysis by, 163-164 in protein folding, 161-163 in trans or cis conformation, 161 U F and U s molecules, 162 unfolded state, conformational heterogeneity of, 160 Xaa-Pro peptide bonds, 161-162 rate-limiting events in, 176-188 activation barrier, height of, 176 criteria, experimental, for activation-controlled processes, 177-178 CspB, 180-184, 187 cytochrome c, 187-188 Eyring equation, 182 intermediates and activated states, 186-187 kinetic methods, new, to follow folding in submilliseconds
316
range, 187-188 kinetics and equilibrium stability, correlation of, 178 m value and heat capacity, changes in, 178-184 mutational analysis of activated states, 184-186 properties of activated states, analysis of, 178-186 protein engineering method, 184-186 stopped-flow mixing techniques, use of, 187-188 transition state theory, 176-177 two-domain proteins, folding of, 190-196 calorimetry, use of, 191-192 y n crystallin, 194-196 domain, meaning of, 190-191 immunoglobulin light chain, 192-194 problems with, 190 two-state folding reactions, 165-170 "chevron," 165, 169, 180 chymotrypsin inhibitor CI2, 165-166 cold-shock protein CspB, 166-169 small and fast-folding proteins, other, 169-170 "Protein-Folding Problem," 219 (see also Thermodynamics) Protein hydrodynamics, 271-305 (see also Hydrodynamics) Proteins, binding of ions to, 99-152 (see also Ions) Proteolysis, 5, 25, 190, 199 PROTEPS, 282 Pyridine borane, 30 Pyridoxal phosphate, 30 Pyruvate kinase, 121 Radioiodination, 30, 47 Random coil, 219-220 Refractometers, 275
INDEX
^-Repressor, 169-170, 187, 188 R-factor search, 11 Rhodopsin, 264 Ribonuclease Th 87, 162, 163, 176, 189, 192 Ribonuclease A (RNAse A), 160, 162, 173, 175, 189,240 Scanning microcalorimetry, 165 SDS-PAGE, 72 SEC/MALLS, 277 Sedimentation equilibrium, 286-291 absorption system, 287 computing packages, 291 interference system, 287 Lambert-Beer law, 287, 288 MSTAR, 291 self-association, 289-290 thermodynamic nonideality, 289, 290 not transport method, 287 Sedimentation velocity in analytical ultracentrifuge, 282-286 absorption system, 283-284 DCDT, 284 extrapolation to zero concentration, 284-286 Gralen parameter, 286 sedimentation-diffusion method, 282 SVEDBERG, 284-286 XLA-VEL, 283-284 Seeding, 5 SIGMA A, 19 Site-directed mutagenesis, 6, 14, 63, 73,74,76,174,178,184-186, 198,254 Size exclusion chromatography, 274-277 Sodium borohydride, 30 Sodium cyanoborohydride, 30 SOLPRO, 295, 301 Space Group, 8 Spectrin, 187
Index
SQUASH, 16 Staphylococcal nuclease, 161, 172 Static light scattering, 277 Stokes sphere, 292 Stopped-flow spectroscopy, 160, 165, 187-188 Structure factor, 10, 12, 17 Subtilisin, 76 Superoxide dismutase, 121 Symmetry, 7, 10, 11, 16 Synchrotrons, 8-10 (see also Protein crystallography) T4 lysozyme, 181 Tail spike protein, 202-203 Tanford-Kirkwood model, 79-80, 85 Thermodynamics of protein folding and stability, 217-270 abstract, 218 crosslinking, effects of, 259-263 fibrous proteins, 263-264 collagen family, 263-264 myosin/tropomyosin family, 264 finale, 265 introduction, 218-237 Boltzmann probability formula, 227-229 bonded interactions, 223 conformational states, 220-223 enthalpy, 227-232 "enthalpy-entropy compensation" phenomenon, 232 entropy, 228-232 equipartition theorem, 233 First Law of Thermodynamics, 227 Gibbs Free Energy, 228 Heisenberg uncertainty principle, 224 hydrogen bonds, 224-225, 245, 246 hydrophobic interactions, 225, 246 interactions, 223-225
317
internal energy, 227 Laws of Thermodynamics, 227, 231 LeChatelier's Principle, 225-226, 247 London dispersion forces, 223-224, 246 magnitude of heat capacity, 230 nonbonded (noncovalent) interactions, 223 "Protein-Folding Problem," 219 "random coil," 219-220, 222-223 "salt bridge," 224 Second Law of Thermodynamics, 227 semantics, definitions, and general considerations, 219-225 thermal energies and fluctuations, 233-234 thermodynamics, 225-233 Third Law of Thermodynamics, 231 two-state approximation, 234-237 van der Waals dispersion forces, 223-224 van't Hoff enthalpy equation, 232-233, 236-237, 239-240 "w" term, 229 ligand-binding, effect of on, 247-255 denaturant, 253-254 electrostatic interactions, 251-253 linked thermodynamic functions, theory of, 251 osmolytes, 254-255 pH, effect of, 250-251, 252 membrane proteins, 264 "molten globules" and other nonnative states, 255-256 nonglobular proteins, 263-264 (see also ...fibrous...) notes, 26 reversibility, 256-259 DSC thermogram, 256-259
318
INDEX
DTT reducing agent, addition of, 257 thermal aggregation, 258 of unfolding: reversible globular proteins, 238-247 cold denaturation, 243 differential scanning calorimetry, 238-240 empirical data, 240-243 molecular interpretation, 243-247 Thioredoxin, 77, 86, 90 Titin, 277 Transamination, 33 Trinitrobenzene sulfonate, 32 Tropomyosin, 264 Trp repressor, 198 TRV modeling program, 298 Trypsin inhibitor, 188-189 Tryptophan synthase, 184, 199, 200
X-ray crystallography/diffraction, 120, 102, 104, 161, 163, 190, 244, 272, 291 XDS, 10 Xengen, 10 XPLOR, 19 D-Xylose isomerase, 126-128, 129, 141, 145
Ubiquitin, 175-176, 186,240
Zinc finger proteins, 113-114
Unit cell, 7, 8, 10, 11, 16, 17 van der Waals forces, 19, 62, 103, 223224, 245, 246, (see also Thermodynamics) van't Hoff enthalpy equation, 232-233, 235, 236-237, 239-240 (see also Thermodynamics) White radiation, 9